Statistics

Users
3486
Articles
156
Articles View Hits
1529730
You certainly learn a lot coding an entire game by yourself.  When you work on a coding team, usually you just focus on one section of the code.  But by having only myself to count on, I find that I have to master many different technologies.

In many ways this is great, I feel that I am a much better programmer than when I started.  But other ways it sucks because I just don't have the experience in all these fields to solve the problems that I run into.  That's when I count on associates and friends.  A very good friend taught me the concepts behind lockstep multiplayer that makes this game run.  But recently I've run into the most perplexing problem and had no idea where to look for solutions.

 When we would play multiplayer, one tester would always go out of sync while the rest of us would remain in sync.  Every time same guy.  So it was natural to assume that he was doing something wrong.  I printed out tons of logs, had them over 100MB at one point.  The reason for the out of sync is that his guys were slightly out of position.  That made no sense, math is clear and defined and it was the same on all machines.  Not so Daniel-son....

 A great guy that I work with schooled me a little on this issue.  I was just telling him about it.  I often do this when I face barriers because you just never know where help can come from.  He came up with the oddest question, what cpu does everyone have?  That made no sense to me, but he said that if I was doing floating point math, that the cpu could make a difference.

 Well that was all I needed, my clue.  I did some research, don't you just love the internet, and found out that floating point math (like the kind used by DirectX in handling location vectors) is not an exact science.  There can be varying results!  Let me give an example:

1. 17543 / 1000 = 17.5430001
2. 17543 / 1000 = 17.5429999

These are actual results that can come from a cpu.  It's a very small difference, but it accumulates in a program where men are marching all the time.  So these difference add up and soon guys are completely out of position.  The great thing is that this is nothing new, it's been around for quite some time, just new to me as this is my first time coding lockstep multiplayer.  btw - I just learned it was called lockstep too :)  It means that only user input is sent across the internet.  All machines run the game independently, but because the randoms are seeded exactly, they all run exactly the same game.  It's very cool, especially for a game such as this where there are thousands of locations that would have to be sent across the wire to do it any other way.

The good news that I found during my research is that the solution has been around as long as the problem.  The solution is to use integers, which are an exact science.  This is done automatically through a technology called fixed point math.  Where floating point math has the decimal move around to where ever it makes the most sense, fixed point math keeps it in the same spot always.  Also, they do all the math with integers rather than floats.

So 5.43 * 17.3 -> 543 * 1730 =  939390 -> 93.939 , so by converting to integers first, you get the exact math.  I don't know if you saw the forum, but I posted a little floating point test to prove this.  I asked people with amd cpus to run the test and send me the results.  The first log is straight floating point math using directx math calls.  These never match exactly across cpus.  The 2,3,4 test used fixed point math to do the same exact stuff.  These matched across all cpus.  So I just have to convert all of my location logic to use fixed point math rather than the directx math.  Should not be too hard, since I wrote my own vector class to move units around the map.  We'll see.  Hopefully we'll have multiplayer in sync very shortly.

Thank you to everyone that ran my test.  I needed to confirm that it worked across many different machines.  The results were perfect.  All of the fixed point math test matched exactly.  Now back to the drawing board and see if I can get this in the code without too much hassle.  Then we can get passed this bug and onto the next one.