Debugging Client disconnect issue

KevinB
KevinB
edited August 2014 in Photon Server
Hi,

we have been seeing Timeout Disconnect issue on our server side. some of our clients report intermittent disconnect even though their wifi network is pretty stable, and with low rtt.

in our log4net.config file, we enabled all module to DEBUG level(OperationData, Photon.SocketServer, LoadBalancing, ExitGames.Messaging.Channels, etc), didn't notice any particular issue.

any suggestions on where I should start looking into? are there more debug prints I could enable in Photon? such as buffer size/queue size, etc? is it useful to override OnSendBufferEmpty/OnSendBufferFull in PeerBase, and print there? Thanks!

Kevin

Comments

  • BesoC
    BesoC ✭✭
    Very interesting question for me. I have the same issue - clients are disconnected even though they have stable internet. e.g. they have low ping, can listen music, watch video, play games online. However, at the same time they are disconnected from the server with TimeoutDisconnect reason. Clients are Flash and transport is TCP.

    I have the imagination that the channel between client and server is 'jammed' and heartbeat/keepalive packets cannot pass through it.

    What can be done to increase connection stability? Maybe there is a sort of 'cleaning jam' by reconnecting at low TCP level?
  • KevinB, which client lib / protocol are you using?

    There is a flow control mechanism in Photon - if data can not be delivered to a client "fast enough", Photon buffers the data up to a certain point, at which it calls the "OnSendBufferFull" method. The default behavior (in PeerBase.OnSendBufferFull) is, that the client is disconnected - you can change that behavior by overriding that method, delay sending until "SendBufferEmpty" is called and then resume sending.

    However, this is NOT your problem here: if Photon disconnects a client after a "SendBufferFull", it would result in a "ManagedDisconnect", not a "TimeoutDisconnect". TimeoutDisconnect means that Photon has not received data / ACKs from the client for a certain amount of time.

    It might help to focus on the client side for debugging, if it is reproduceable, you could trace the traffic with wireshark to see what's going on etc.

    I'm going to point a client dev to this thread as well, they might have more hints for you.
  • BesoC
    BesoC ✭✭
    Thanks, Nicole. I was assuming too that TimeoutDisconnect happens when server did not received reply from client for some amount of time. Nice to know that I was not mistaken.

    Looking forward for hints from Flash client dev :)
  • I would set the client side debug level to the maximum and look at the timestamps in the log and also log the queue sizes to see if you can detect long pauses (which may indicate that the client is too busy with other tasks to ping the server regularly enough).
  • Hi,

    Can you tell please which sdk version do you use?
  • BesoC
    BesoC ✭✭
    Hi Vadim

    I'm using Server SDK v3-2-10-4248 and Flash SDK v3-2-1-1
  • Hi Kaiserludi,

    you mentioned that I can try to log the queue sizes to see if I can detect long pauses. What queue size can I log? I am using iOS SDK v3.0.4.4. The PhotonPeer class has getQueuedIncomingCommands/getQueuedOutgoingCommands/getIncomingReliableCommandsCount, and I have logged those values during client timeout and they are all zero.

    and the client's round trip time has been a consistent 200-400ms before the disconnect.

    also, how would I set the client side debug level to the maximum?

    Kevin
  • Hi KevinB.

    I have indeed meant those queues that you have mentioned. Well, if they are all 0 then that means that you are calling service() often enough to proceed everything incoming and outgoing fast enough, so we can out-rule that as a reason for the timeout disconnects.
    200-400ms rtt isn't bad enough to trigger a timeout, either. However you could still check the variance, just in case.

    You can set the client side debug level to max by passing DEBUG_LEVEL_ALL to PhotonPeer::setDebugOutputLevel().
  • Kaiserludi, the rtt is consistently under 50ms.

    is there a way we can redirect all Photon internal debug print to a file/memory buffer? we would like to enable debug print in certain production devices to track the disconnect issue.

    KevinB
  • The client lib doesn't actually print any debug info, but sends it to the debugReturn() callback of the listener instance that you have passed to the peer constructor, exactly because that way your app can decide what to do with the debug out: ignore it, print it to stderr, to stdout, to a textfile, sent it to a server, and so on. You can even do different things depending on the debug level: print warnings, sent errors to a server and if a fatal error occurs then your game will automatically get your CEO out of bed in the middle of the night ;-)
    If you have used one of our demos as base for your games networkig code, then your current implementation of debugReturn() will probably just print all debug strings to stderr, but you are free to change that behavior.
  • BesoC
    BesoC ✭✭
    Looks like debugreturn is unavailable in flash client :(
  • Any resolve to the disconnect problem?
    I am having similar issues, after we release our game to steam. A lot of people cannot enter Lobby. We are authenticating users on a Login server, then connection to close and a new one is created to Lobby server. A good percentage of people cannot get data from lobby, they just get disconnected after like 30 seconds.
  • Hi,

    please check this link for further information:
    http://doc.exitgames.com/en/realtime/cu ... isconnects

    It sounds like you send too much data to your clients when they enter the lobby, which leads to disconnects. Might happen if you have lots of players in your lobby and send updates / stats regularly, so that they receive LARGE lists of datas (which the clients can not process in time). Can you look into that?

    Edit: just noted that you created a separate thread with more details. Thanks! Let's continue discussion there. I'm closing this thread to avoid confusion. :)

    Here's the new one:
    viewtopic.php?f=5&t=5131
This discussion has been closed.