Inter-Server connection loss...

Options
David
edited July 2013 in Photon Server
Hi,

we have two game servers, one in the US and one in Europe. They are connected via TCP (i.e. using ApplicationBase::ConnectToServerTcp())
Occasionally, every few days, they suddenly get disconnected from each other (with reason "ClientDisconnect").
We originally believed that there was some hardware reason behind this, like the two datacenters getting disconnected for a moment or something like this. So we made a simple test app that opens a (non-Photon) TCP connection from one machine to the other and sends a short message every 10 seconds. These apps do not get disconnected, even when the Photon connection does. This must be a software problem and it seems to affect only the Photon connection?
Any ideas?

David

Comments

  • Do you see the disconnect reason "ClientDisconnect" on BOTH instances? That would indeed be strange...

    Normally, the reason should be different. For example, if you have a connection from Server A to Server B:

    - "Timeout" on Server B (Server B has not received any data for a certain amount of time and decides to close the connection) ----> "ClientDisconnect" on Server A (Server A notices that the "other" part of the connection (which it sees as a "client") has closed the connection)

    If both servers log a "Client disconnect", I would also assume that there is a network / hardware reason.

    What you can do:
    - double check the disconnect reason on both instances.
    - check the PhotonServer.config and see if there is an "InactivityTimeout" configured for the port (listener) to which the S2S connection is established. Remove it, increase it significantly or at least make sure that data is sent more frequently than that interval. (However, if that's related to your problem, there should be a "TimeoutDisconnect" on one side.)
    - sniff the traffic with wireshark (on both sides) and see what happens when the connection fails. Make sure to use filter settings to log only the minimal amount of data, to avoid impact on your server performance. (You could also send us the traces and we'll have a look.)

    In general, you should always expect that the connection can be closed any time due to general network issue, and just reconnect & recover - but I agree that this is strange behavior that needs a closer look.