Managing lag in a determinstic game engine?

Options
Obatztrara
edited March 2012 in DotNet
[I posted the same thing on the Unity forums, but since its really plattform-independent I wanted to try my luck here as well. :-)]

I can't really figure out how to deal with network caused lag in my RTS.

What I currently I have:
  • A (somewhat) deterministic game engine, that runs in locksteps. Input of players is recorded, broadcasted to all other players and executed in a future update. That way - in a world of stable network connections and reasonable lag - everyone ends up with the same simulation. This actually works quite nicely.
  • I'm using Photon cloud, which means that my server behaves non-authorative (basically just relays messages).

There are a couple of problems though, that Im having trouble with:
  • How do players know when they are lagging? Currently I have a client first broadcast its input to other clients, then wait for all clients to confirm and finally confirm back. So after an average of 1,5 * Ping all players should have the input + a confirmation that everyone else has it too. However, with this system there are always states when one client thinks everythings good to go and the other one might still be waiting for confirmation. This is when they know they are lagging, but noone else does. Which leads to the seconds question.
  • If one client detects that he is lagging and needs to interrupt the simulation to wait for the missing messages, how do I notify the other clients about this? The problem is, that they might have not detected that they are lagging and thus didnt interrupt the simulation. So now the lagging client has to catch up a few simulations and then has to tell everyone else that they can carry on simulating.

Ugh. Sorry for the wall of text. Feel free to ask if anything (or everything ;) ) remains unclear, I'll try and explain it more detailed.

Thanks in advance for any advice...

Greetings,
Obat

Comments

  • The mechanics of my game are different from yours but I am trying to solve teh same issue. I am just describing what i have been doing and maybe it may give you some hints on what you could do... maybe!

    I have kind of a similar situation using photon cloud on WP7 with the server just relaying messages. In my WP7 game air soccer tour, two players compete with each other in a casual match. Both phones are running the physics simulation. Its a dynamic 'turn-based' game - its not static like cards game where lag wouldnt effect the outcome of the simulation. Any one team can kick the ball at a time. A team has 4 seconds to kick the ball, but if the team kicks teh ball 2 seconds into tehir turn, the turn is siwtched over. However if they dont take their turn in 4 seconds, teh turn is automatically switched anyways.so the idea for me was, if teh same simulation engine is running on both sides with a simulation clock = wall clock time , i will just need to transfer the "kick" event from one phone to anotehr and tahts it.

    Initially I only sent a packet from a phone when a player moves his "striker" to kick the ball which kinda happens every 3-4 seconds. I only sent which striker was moved to kick the ball with how much force. The receiving phone receives that data and runs simulation. This resulted in completely different outcomes on the two phones since the data arrived e.g. 400 milliseconds later and by then world objects (positions/velocities) on the receiving end were different (stepped forward) than when the sending phone actually kicked the ball. so after 30 seconds of gameplay you would see completely different world situation on both phones.

    My second attempt was to send all "striker" positions/velocities and ball position/velocity in that packet along with which striker was moved to kick the ball with how much force. This "syncronized" the simulation on both phones BUT the gameplay is horrible. Becuase when I receive the packet on the phone I "step back" the simulation i.e. impose the positions/velocities received on all world objects before I move teh simulation forward with the ball kicked. Visually it causes all game objects to "tele-port" (change position in an instant - in most cases it looks like they took several steps back). This does solve most of teh sync scenarios though

    So now what I am going to try is soemthing different. I need the "simulation time clock" to be in sync. which means when time goes from t0 to t5 on phone 1 and then it sends a kick message to phone 2 the message takes tx time to get there. On phone 2 if the simulation clock time is running the same then it will receive teh kick event at t5+tx rather than at t5.

    So, one idea I am playing aroudn with is when its phone 1 turn to kick, the simulation clock = wall clock time, as soon as phone 1 takes its turn and send teh packet over , simulation clock is "slowed down" just a bit. When phone 2 takes it turn and kicks the ball and the event is sent back, I am hoping it will arrive at a time which will not needed lost of step backs since I have slowed down simulation time on phone 1.

    This did kind of minimized teh issue , however I cant really do a fixed slow down of the simulation clock. network lag varies etc.

    So now what I am goign to do is, instead of just sending events - I will send "simulation clock time" from one phone (master) to another every 300 milliseconds lets say. when the packet arrives on the slave phone I compare it with local simulation clock time and based on the delta I decide how much to slow down the clock, and it will happen over and over till i get the kick event with world object positions - at which time I will force positions before taking teh kick event on slave phone .. but because i had constantly "synced" simulation clock, it will not tele-port too much.. Now that its phone-1s turn to kick teh ball it becomes master and the otehr phone becomes slave and the cycle contnues. I havent coded this though.. so dont know what teh results would be.
  • Ok, this doesnt really have anything to do with my problem.

    // Start Offtopic

    Anyways, here's what you have to do: You have to run your game at FIXED intervalls. Synchronizing time will moooost likely fail or not be accurate enough. Whenever input is recorded you have to send it to all players and execute it in the same update on all machines! Its important that the input is executed on all machines in the same update, so that the outcome is the same on all machines. That means your RPC will probably consist of the input and the updatecycle you want it to be executed in.

    Check out this Link, which explains this concept in more details:

    http://www.gamasutra.com/view/feature/3 ... twork_.php

    Good Luck!

    // End offtopic

    Still hoping for answers! :D
  • Tobias
    Options
    I will read through your posts tomorrow and hope I can help with all questions.
    Your posts are pretty long. Any change you could "bold" and number your questions?
  • The second post is not from me! :-P
    Only the first post is relevant to my questions.


    Don't worry though. I have resolved the issues pretty much.
  • Obatztrara:

    Thank alot, the solution you proposed is so simple and elegant, I was over complicating it. I tried it and its awesome. I am executing commands after 240 ms (after 15 update cycles - each cycle is 16ms) and the gameplay is fine. And I guess 240 ms will be enough in most cases to get the same command over to the other player. Also on the game a goal is scored after every 30 seconds at max. which resets the positiosn and world so i dont have to worry about alot of error accumulating.

    Thanks.
  • Glad I could help! :-)
  • Tobias
    Options
    I am late. Sorry.
    I would love to keep this discussion alive for everyone interested. I can't provide first-hand solutions but ideas and feedback where possible.
    • How do players know when they are lagging? Currently I have a client first broadcast its input to other clients, then wait for all clients to confirm and finally confirm back. So after an average of 1,5 * Ping all players should have the input + a confirmation that everyone else has it too. However, with this system there are always states when one client thinks everythings good to go and the other one might still be waiting for confirmation. This is when they know they are lagging, but noone else does. Which leads to the seconds question.
    • If one client detects that he is lagging and needs to interrupt the simulation to wait for the missing messages, how do I notify the other clients about this? The problem is, that they might have not detected that they are lagging and thus didnt interrupt the simulation. So now the lagging client has to catch up a few simulations and then has to tell everyone else that they can carry on simulating.

    a) Each client can only detect it's own lag to the server. GetPing() will give you an impression of how much this is and you could send it around. There is no other way to share this.

    b) I would say that a lagging client most likely lags in both directions and there might be trouble to send updates about the state then.
    A solution could be to include the last executed lockstep-frame when you send updates. Everyone knows where the others are in terms of execution and could wait if one isn't sending "i did this" messages?

    What are your solutions so far?
  • Hmm interesting solution to b) I might look into that.

    So far I'm doing this:

    a) Every input command is broadcasted to all players. The players confirm and the original sender confirms back. If either party reaches the lockstep update where the input has to be executed, but is still missing a confirmation from any player, the following happens: The client executes the input anyways but requests an increase of the input delay from all players (at a certain lockstep, so everyone stays synced). On a regular basis the delay is reduced on all clients, hoping that the lag has decreased. This leads to the delay always bouncing around at the upper limit. So far it works.

    If, at any point a command reaches a player, but its execution lockstep has already passed, things get serious. So far I just close the session, but later on I might add more sophisticated methods of Out Of Sync recovery, but for now I have other troubles.. :D

    b) I realized that it doesnt really make a difference if either players simulation runs a second later. As long as its THE SAME simulation its really not much of a big deal (lets ignore players stream cheating or talking on the phone for now ;-) ). Later on I might do something like you suggested.
  • Hi, I'm facing similar issues so I thought I'd join this thread, if necessary I may create a new thread for my particular problem. So far this thread is very interesting for me.

    So, similarly I am trying to sync real time updates between clients using the non-authoritative server (Cloud/InstanceLoadBalancing).

    In my game there are slow moving projectiles from each player that can hit slow moving enemies. For good visual performance on each client, I have each player instantiating their own projectiles and the other players using interpolation and prediction to display it (the projectiles and enemies can endure prediction out to about half a second without much trouble).

    The issue is with the enemies, who would in an authoritative server belong to the server, but in this environment belong to the master client. The collision detection also occurs entirely on the master client. This poses a dilemma:

    1. I can have the non-master clients draw the enemy as it comes in but when they shoot at the enemy the synced projectile on the master client (who does collision) misses the enemy due to latency.
    2. I can have the non-master clients draw the enemy extrapolated so that their shot hits the enemy in the master client, but the non-master clients won't know that the enemy is hit until the master client confirms it and they hear back.

    So either way the behavior is unacceptable. From what I can tell, in the authoritative approach the solution is to run the enemy AI some number of milliseconds behind everything else, but I don't know how to translate that approach to the photon cloud. I'm still working on this problem and would appreciate any input about how others have overcome similar situations.

    Thanks
  • It seems like the best approach would be to somehow rewind the master client so that the collisions of other players occurred at the correct time, does anyone have any comments on how to do something like that? On the master client the collisions occur using unity's Collider.OnTriggerEnter
  • dreamora
    Options
    In unity you can not rewind / fast forward the collision at all.
    What you need to do is what the examples and also Unity Networking Example on their page does: Send the input events from now and apply them 100ms / xx ms in the future. that way you can discard them or void them if they are incorrect.
  • Thanks the response dreamora, can you be more specific about which examples you are talking about?
  • I was able to find this example: http://unity3d.com/support/resources/ex ... ample.html

    This is perhaps the example you were referring to but all I saw in that code is a robust interpolation and prediction system using states. This is certainly very useful, which is why I have added it to this thread. I use something very similar already.

    My question is more about how to sync projectile instantiation and destruction in a way that is consistent for multiple players in a high latency environment. It seems that either one player has a lag in the instantiation step, or (in my case) a lag in the destruction step. Maybe I am missing something though.

    Thanks.
  • dreamora
    Options
    that actually would normally be done through using the instantiation network timestamp (the info object carries that along that you can catch on RPCs if you want) and then correspondingly 'fast forward' the instantiation if required (movement basing on the time from instantiate till now with a swept collision test or alike)