Photon LoadBalancing Strange Issue

Options
Hi, we are currently running a modified LoadBalancing app and we have started encountering an issue regarding the creation of rooms. We have found that on occasion when a player joins a game they are unable to send any operations to the server thus making it impossible to play in the game. Currently when the game does this we receive this notification in our server logs after a considerable delay of the disconnect of the last player.

Error:
2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.HivePeer - OnDisconnect: conId=3, reason=TimeoutDisconnect, reasonDetail=
0000000009C8DCB0 Sent: 3826629296 now: 3826656218 (diff: 26922 >= max: 30000) new rtt timeout: 3200 since last touch:27937
Disc: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826656218 Sent: 3826629296 Diff: 26922 Resends: 8
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826652890 Timeout: 3200 Resend: 8
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826648828 Timeout: 3200 Resend: 7
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826644671 Timeout: 3200 Resend: 6
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826640859 Timeout: 3200 Resend: 5
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826636796 Timeout: 3200 Resend: 4
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826632734 Timeout: 3200 Resend: 3
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826630781 Timeout: 1920 Resend: 2
Resend: Seq: 12 CH: 255 RTT: 192 VAR: 72 Now: 3826629796 Timeout: 960 Resend: 1
ACK: Seq: 11 CH: 255 RTT: 192 VAR: 72 Now: 3826628281 Sent: 3826628140 Diff: 141
ACK: Seq: 10 CH: 255 RTT: 199 VAR: 79 Now: 3826627125 Sent: 3826626859 Diff: 266
ACK: Seq: 9 CH: 255 RTT: 190 VAR: 83 Now: 3826625843 Sent: 3826625546 Diff: 297
ACK: Seq: 8 CH: 255 RTT: 175 VAR: 75 Now: 3826624531 Sent: 3826624406 Diff: 125
ACK: Seq: 7 CH: 255 RTT: 182 VAR: 84 Now: 3826623390 Sent: 3826623250 Diff: 140
ACK: Seq: 6 CH: 255 RTT: 188 VAR: 98 Now: 3826622234 Sent: 3826622109 Diff: 125
ACK: Seq: 5 CH: 255 RTT: 196 VAR: 110 Now: 3826621093 Sent: 3826620968 Diff: 125
ACK: Seq: 4 CH: 255 RTT: 206 VAR: 123 Now: 3826619953 Sent: 3826619828 Diff: 125
ACK: Seq: 3 CH: 255 RTT: 217 VAR: 137 Now: 3826618812 Sent: 3826618671 Diff: 141
ACK: Seq: 2 CH: 255 RTT: 227 VAR: 157 Now: 3826617656 Sent: 3826617531 Diff: 125
ACK: Seq: 1 CH: 255 RTT: 241 VAR: 175 Now: 3826616515 Sent: 3826616406 Diff: 109
ACK: Seq: 6 CH: 0 RTT: 259 VAR: 189 Now: 3826615390 Sent: 3826615265 Diff: 125
ACK: Seq: 5 CH: 0 RTT: 278 VAR: 207 Now: 3826615250 Sent: 3826615187 Diff: 63
ACK: Seq: 4 CH: 0 RTT: 308 VAR: 205 Now: 3826615250 Sent: 3826615187 Diff: 63
ACK: Seq: 3 CH: 0 RTT: 342 VAR: 191 Now: 3826615171 Sent: 3826615109 Diff: 62
ACK: Seq: 2 CH: 0 RTT: 381 VAR: 161 Now: 3826614859 Sent: 3826614765 Diff: 94
ACK: Seq: 1 CH: 0 RTT: 422 VAR: 120 Now: 3826614734 Sent: 3826614578 Diff: 156
ACK: Seq: 0 CH: 255 RTT: 459 VAR: 71 Now: 3826614562 Sent: 3826614390 Diff: 172
Init: RTT: 500

Here is part of our logs:
GSGame: https://pastebin.com/6Dqx7yxJ
NSMaster: https://pastebin.com/iBxryJGW


We have also started to notices that regardless of weather or not people are able to send operations in the room, when that room is destroyed we receive :

2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.Caching.RoomCacheBase - Removed room instance: roomId=asd
2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.Room - Tried to remove room: roomName=asd, removed=True
2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.HiveHostGame - we are actually beeing released - not sure this should be called.

I had initially thought it could be to do with a timer that we run server sided and when OnClose is called we stop the server, could this possibly be an issue?

I am not quite sure what this is meaning: "we are actually beeing released - not sure this should be called."

We have been having this issue quite regularly and seems to consistently happen the first time we create a room after the server has been rebooted (through photon control not a system reboot).

Any help would be greatly appreciated. Many thanks,
Leon

Comments

  • chvetsov
    Options
    Hi, @LeonPS

    Your logs look fine from the server-side point of view. Taking into account fact that you have changed something I cannot say too much. But still, there is something.

    1.
    >>> 2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.Caching.RoomCacheBase - Removed room instance: roomId=asd
    2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.Room - Tried to remove room: roomName=asd, removed=True
    2018-10-13 18:49:02,409 [33] DEBUG Photon.Hive.HiveHostGame - we are actually beeing released - not sure this should be called

    you may ignore this. these messages are 'debug' messages. we just did not remove some of our thought from code. Sorry for the inconvenience. If it will make you more relaxed, same messages will appear in our cloud logs, if we enabled debug logging. This is not your issue and not the error. so, hope this is clear.

    2. from logs I see that you are loading our TestPlugins. Why is that? Just disable them in config if you do not use them. of use WebHooks plugin which is in a separate folder next to TestPlugins.

    3. Back to your issue with timeouts. I personally think that issue could be in your client. in a place where you handle properties received from the server.
    4. you also should take a look into native logs or at least provide them. these logs are in bin_win64/logs folder

    best,
    ilya
  • LeonPS
    Options
    Hi, @chvetsov
    Thanks for your response, I am glad to hear that my second issue is not a problem, however I am still somewhat stunned and really not too sure what my main issue could be.
    You mentioned that it could be to do with where we handle our properties received from the server, could you elaborate on what you mean by this. Are there any other things that spring to mind when you see this issue, I think I would agree that it is client related but other than modifying NetworkingPeer we have not changed anything. We just have added a few operations and events. I am really not sure why this is occurring and it seems to be occurring completely randomly. Thanks again, Leon
  • chvetsov
    Options
    Hi, @LeonPS

    the thing is that I do not know what did happen with your client. I assumed that you may have issues on client with properties handling because of first time I saw timeout disconnect after CreateGame operation, Second time after SetProperties operation. in both these cases, you may get response containing Game Properties. It might be the wrong assumption.
    I do not see any exception in your managed logs. So, for sure this is not managed code bug. And managed code cannot force the client to timeout.
    So, you need to check native logs from the server at the same time. maybe they contain more info. give me a link to it

    The second thing is diagnostic on the client side. you need to add logs there. also please give me info about your clientside lib. what are you using? PUN? What version?

    best,
    ilya