Intermittent DisconnectByServer

Options
chunr
edited October 2012 in Photon Server
I have the latest Photon V3 instance deployed on my server (Windows Server 2003 R2, 64bits) and my Unity client talks to the server thru internet. The code does something like following:
    Connect to Photon server Call some custom operations to load stuff (user settings, weapon parameters, maps, etc) from server Join lobby room

It can always connect to photon server. But sometime the connection is intermittently dropped when the client code calls custom operation or join lobby room (I think this is also a pre-define operation). The client side(LitePeer) log prints following:

[code2=plain]queueIncomingCommand( NC(0|8 r/u: 73/0 st/r#/rt:0/0/0) ) - incomingReliableSequenceNumber: 38, timestamp = 17526921
assembled fragmented payload from 17 parts. Dispatching now., timestamp = 17526937
Server sent disconnect. PeerId: 12 RTT/Variance:25/8, timestamp = 17540781
PeerStatusCallback : DisconnectByServer, timestamp = 17540796
StopConnection(), timestamp = 17540812[/code2]

The server side log shows:
[code2=plain]2012-09-16 14:53:02,520 [19] DEBUG Lite.Room [(null)] - Join operation from IP: 101.80.136.95 to port: 5055
2012-09-16 14:53:02,520 [19] DEBUG Lite.Room [(null)] - Actor added: 10 to game: default_lobby
2012-09-16 14:53:02,520 [19] DEBUG Photon.SocketServer.PeerBase [(null)] - SentOpResponse: ConnID=12, opCode=255, return=0, ChannelId=0 result=Ok size=14 bytes
2012-09-16 14:53:02,520 [19] DEBUG Photon.SocketServer.PeerBase [(null)] - SentEvent: ConnID=12, evCode=252, ChannelId=0, result=Ok size=18 bytes
2012-09-16 14:53:30,650 [14] DEBUG Photon.SocketServer.ApplicationBase [(null)] - OnDisconnect - ConnID=12
2012-09-16 14:53:30,650 [15] DEBUG Photon.SocketServer.PeerBase [(null)] - Peer 12 changed state from Connected to Disconnected
2012-09-16 14:53:30,650 [15] DEBUG Lite.LitePeer [(null)] - OnDisconnect: conId=12, reason=TimeoutDisconnect, reasonDetail=
0 - 74 - 0 - 0 - 2693801437 - 2693829937 - 28500
1 - 73 - 234 - 38 - 2693815984 - 2693816187 - 203
2 - 49 - 238 - 41 - 2693815984 - 2693816187 - 203
3 - 48 - 243 - 43 - 2693815984 - 2693816187 - 203
4 - 47 - 248 - 44 - 2693815984 - 2693816187 - 203
5 - 46 - 254 - 44 - 2693815984 - 2693816187 - 203
6 - 45 - 261 - 42 - 2693815984 - 2693816187 - 203
7 - 44 - 269 - 37 - 2693815984 - 2693816187 - 203
8 - 75 - 278 - 28 - 2693808296 - 2693808593 - 297
9 - 43 - 276 - 32 - 2693808296 - 2693808593 - 297
10 - 42 - 274 - 35 - 2693808296 - 2693808593 - 297
11 - 41 - 271 - 39 - 2693808296 - 2693808593 - 297
12 - 40 - 268 - 44 - 2693808296 - 2693808593 - 297
13 - 39 - 264 - 49 - 2693808296 - 2693808593 - 297
14 - 38 - 260 - 54 - 2693808296 - 2693808593 - 297
15 - 37 - 255 - 60 - 2693808296 - 2693808593 - 297
16 - 31 - 250 - 66 - 2693808296 - 2693808593 - 297
17 - 30 - 244 - 73 - 2693808296 - 2693808593 - 297
18 - 29 - 237 - 79 - 2693803875 - 2693804046 - 171
19 - 28 - 246 - 84 - 2693803875 - 2693804046 - 171
20 - 27 - 256 - 87 - 2693803875 - 2693804046 - 171
21 - 26 - 268 - 87 - 2693803875 - 2693804046 - 171
22 - 25 - 281 - 84 - 2693803875 - 2693804046 - 171
23 - 24 - 296 - 76 - 2693803875 - 2693804046 - 171
24 - 23 - 313 - 59 - 2693803875 - 2693804046 - 171
25 - 83 - 333 - 32 - 2693801828 - 2693802171 - 343
26 - 65 - 332 - 39 - 2693801812 - 2693802171 - 359
27 - 64 - 329 - 44 - 2693801812 - 2693802171 - 359
28 - 36 - 325 - 49 - 2693801812 - 2693802171 - 359
29 - 35 - 321 - 54 - 2693801812 - 2693802171 - 359
30 - 34 - 316 - 60 - 2693801812 - 2693802171 - 359
31 - 33 - 310 - 66 - 2693801812 - 2693802171 - 359
32 - 32 - 303 - 71 - 2693801812 - 2693802171 - 359
33 - 22 - 296 - 76 - 2693801812 - 2693802171 - 359
34 - 21 - 288 - 81 - 2693801812 - 2693802171 - 359
35 - 20 - 278 - 85 - 2693801812 - 2693802171 - 359
36 - 19 - 267 - 86 - 2693801812 - 2693802171 - 359
37 - 18 - 254 - 83 - 2693801812 - 2693802171 - 359
38 - 17 - 239 - 75 - 2693801812 - 2693802171 - 359
39 - 16 - 222 - 59 - 2693801812 - 2693802171 - 359
40 - 82 - 203 - 33 - 2693801625 - 2693801796 - 171
41 - 81 - 207 - 33 - 2693801609 - 2693801796 - 187
42 - 80 - 209 - 37 - 2693801609 - 2693801796 - 187
43 - 79 - 212 - 42 - 2693801609 - 2693801796 - 187
44 - 78 - 215 - 47 - 2693801609 - 2693801796 - 187
45 - 77 - 219 - 53 - 2693801609 - 2693801796 - 187
46 - 76 - 223 - 60 - 2693801609 - 2693801796 - 187
47 - 72 - 228 - 67 - 2693801437 - 2693801609 - 172
48 - 71 - 236 - 70 - 2693801437 - 2693801609 - 172
49 - 70 - 245 - 72 - 2693801437 - 2693801609 - 172
2012-09-16 14:53:30,650 [15] DEBUG Lite.Caching.RoomCacheBase [(null)] - Removed room instance reference: roomName=default_lobby, referenceCount=13
2012-09-16 14:53:30,650 [20] DEBUG Lite.Room [(null)] - ProcessMessage 1
2012-09-16 14:53:30,650 [15] DEBUG Photon.SocketServer.PeerBase [(null)] - Peer 12 changed state from Disconnected to Disposed[/code2]

From server log, it seems the server responded to the join operation and it waited for about 18 seconds (from 2012-09-16 14:53:02,520 to 2012-09-16 14:53:30,650) then terminated the connection.

From client log, it did not receive the operation response of joining lobby, although it calls "LitePeer.Service()" 20 times per second.

My guess is the server did not receive client ACK within the Timeout limit (with default value MinimumTimeout="5000") so it thinks the client is dead. But the client did not ever stop calling "LitePeer.Service()" and the network connection is quite stable with low latency (RTT is 25).

I have no idea where to look at and how to isolate the problem. Please help me to figure it out.

Comments

  • BTW, this problem happens on local server too. I setup server and client on same box. It appears several times too. So I guess it has nothing to do with network stability. Should be something wrong in either server or client.
  • dreamora
    Options
    do your custom operations send too large message chunks or do you not call the photon service function regularly enough?
  • dreamora wrote:
    do your custom operations send too large message chunks or do you not call the photon service function regularly enough?

    The custom operations send response of 5K bytes back each. The client code calls photon peer service function 20 times per second.
  • Tobias
    Options
    5k is quite a bit. Photon is geared towards sending small packages quickly.
    That said, we will check sending bigger chunks as well. The downside in any case is, that you don't have any info about the progress of such downloads.
  • Tobias wrote:
    5k is quite a bit. Photon is geared towards sending small packages quickly.
    That said, we will check sending bigger chunks as well. The downside in any case is, that you don't have any info about the progress of such downloads.

    Tobias, thanks for your comment. I got some improvement these days. In my previous version of client code, I call 15 operations concurrently, when client connects to server. After I change the code to let it call 15 operation sequentially, the disconnection happens much less than before. No more in local environment and much less thru internet.

    However, we still have one customer who keeps complaining the disconnecting issue. From server log, it shows the client can call certain operation successfully, but always get disconnected at same operation every time. That operation will send about 30K bytes to client. I suspect this is relate to what you said about data size. So I changed the client code again to call the operation in a "pagination" way to do it in multiple batches (10 batches, 3K each). But it does not become better, client still get disconnected even at the first batch. If this is still related to data size, I am wondering what kind of data size is considered "safe" for operation?
  • Anyone had similar issue?
  • Tobias
    Options
    The server will (like the client would) fragment the 30kB you want to send before sending it. It actually adds a lot of overhead to split the 30kB into multiple responses or events, so this is worse instead of improving the situation.

    It's unclear why you get the disconnect. If the client does not send acks fast enough, the server might detect a time out for any of the many reliable packages being sent. It could also be a side effect of otherwise full buffers, so the 30kB break the server's buffer limit.

    Please check your server's logs and see if some errors are logged or if the server logs that it disconnects a client for some reason.
    I will have to ask a colleague to describe what to look for in logs when send buffers are full.

    Edit: The application's log might have entries like "Send buffer full". Let us know if you find these or not.