TCP connection between servers getting severed

Options
David
edited November 2013 in Photon Server
Hi,

we have servers in two different data centers, one in the EU and one in the US. The servers communicate via TCP (via ConnectToServerTcp). RTT is ~100 ms.
Every few days, at seemingly random times, these TCP connections get severed. Peer.OnDisconnect says it is a "ClientDisconnect".

This is not really Photon-related, we wrote a test tool that opens a raw tcp connection and sends a message every 10s, that tool is getting the same TCP disconnect at the same time.

We already talked to our hoster and they say that should not happen, but could not find a problem either.

Now the question: Is there an easy and safe way to handle this? We can reconnect, of course, but there is the danger that some server-to-server messages were lost, right? Implementing our own Acknowledge-Resend-Logic seems absurd.
Is Photons server-to-server-UDP reliable and ordered? If yes, it might be a viable alternative, since it it connectionless and can still deal with lost packets...?

David

Comments

  • First, your hoster is correct. This *should* not happen, but it does. In every real-world scenario, there will be disconnects. The only way to handle them is to reconnect & make sure to re-transmit all data that is needed to re-initialize the server state. As you have absolutely no control over the underlying network, you always need to be prepared for unexpected disconnects - either the short glitches you experience now, or longer / more frequent disconnects caused by "real" problems.

    A switch to UDP would not resolve the issue. S2S UDP is reliable and ordered, but it only "mimics" the TCP protocol. If data is lost, it is resent (in a similar way as TCP handles data loss), but if no data is received / acknowledged for a certain amount of time, there will be a disconnect as well. So in general, we recommend to use TCP for S2S, there are very little use cases where S2S UDP makes sense.

    Sorry - I know that this is not the advice you were looking for, but I hope it helps nevertheless.
  • You might want to have a look at the sources of the Loadbalancing project, especially the OutgoingMasterServerPeer - you can see how the Game Server initially connects to the Master, calls a "register" operation and sends an initial set of "game states" to the Master. Afterwards, only updates for those game states are sent. If there is a disconnect, the GS re-connects, registers again and re-sends the initial game states, etc.

    Maybe this approach works for your scenario as well.
  • Thanks for the Info!

    Unfortunately we cannot send complete states; we are relaying messages between servers and cannot afford to lose any.
    What we'll probably do now is implement an outgoing message queue and have the other server acknowledge messages from time to time, so we know what to resend after a disconnect+reconnect. Maybe this is a feature that photon could provide in the future? If TCP disconnects are common that might make sense.

    Still, I'd like to ask about UDP one more time. The concrete situation we are facing is a very short TCP disconnect. We can reconnect a second later or so. Wouldn't UDP solve that problem for us? Since UDP is connectionless, a one-second-blackout would not trigger a (photon)-"disconnect" and any lost packets would simply be resent.
    If the connection is lost for a longer period of time, the only thing you can do is disconnect all users and restart the servers anyway.
  • chvetsov
    Options
    David, defenetly you should try to use UDP
    if it does not help, you will implement small queue of messages as you described in your post