Reliable message lost???

ufnv
ufnv
edited May 2021 in Native
Strange behaviour observed with the messages that exceed UDP package size. Sometimes they just get lost.

The code looks like this:
Result =  mLoadBalancingClient.opRaiseEvent(true
                                                  , data
                                                  , size
                                                  , Code
                                                  , ExitGames::LoadBalancing::RaiseEventOptions().setReceiverGroup(Lite::ReceiverGroup::OTHERS)
                                                  );

immediately followed by
    mLoadBalancingClient.serviceBasic();
    mLoadBalancingClient.sendOutgoingCommands())

To check the delivery I attach the message ID to each message and print it to trace dump.

On the receiving side I also print the ID of the received message.

Under the heavy load (several messages that exceed UDP package size in row), a couple of messages get lost.

What could be the reason and how I could debug this further?

Comments

  • To add on this - no warning or errors received from Photon.
  • Update - sometimes I get this warning:

    PHOTON: WARNING code: 1037 (PhotonPeer sent queue is filling up. Check, why the server does not acknowledge your sent reliable commands.)
  • When I send a lot of less than udp size packages, they also get lost.

    Also heavily depends on the regional server that I am connecting to - some are better, some are worse, but sending something like 60-100 packages per second, each of 1024 bytes or so eventually leads to the packets not arriving to clients. But the connection is kept, no player gets the "disconnect" callback.

    Sometimes arrives WARNING enetpeer.cpp ExitGames::Photon::Internal::EnetPeer::queueSentReliableCommand() 959 WARNING! There are 100 sent reliable messages waiting for their acknowledgement!

    followed by PHOTON: WARNING code: 1037, but not necessarily. Quite often the messages are just being lost without any warning.

    No warnings or errors at the client (receiving) side.
  • Hi @ufnv.

    I can't reproduce the missing messages. Either the messages arrive or my client gets disconnected.
    Could you provide us with a minimal self-contained reproduction project, please?

  • I will try. It is 100% reproducible for me. Maybe I'll just record the packets and the timing to be able to reproduce it.
  • ufnv
    ufnv
    edited June 2021
    @Kaiserludi

    Uploaded the code here:

    https://drive.google.com/file/d/1usIQ25APvW_dM1FPHwIUTiWw57OJVvD1/view?usp=sharing

    It's a VS 2017 project that builds the "player" of the recorded traffic.
    The record itself is PhotonLostMessages/server_dump.bin file.
    This file needs to be in the same directory as the "server" executable (for example, x64/Release/).

    You need to start the PhotonLostMessages.exe, then when it prints "Waiting for Player 0 to join...", start the client.exe.

    The server application then will be sending the recorded packages in the same order and with (approximately) the same timing as it was recorded inside the game.

    The client application will be reading the received packages and count the markers received. When some package is lost, it will notify with the "MESSAGE LOST!!!" note.

    Both applications also print the Photon output.

    I've recorded a very stressful trace just to be sure the lost package will appear, so finally the forced disconnect will arrive, but much too late - already a lot of packages will be lost and in the real game this can easily lead to crashes, as the game does not expect any package to be lost. It expects them either to be delivered in order or to get forced disconnect in the worse case scenario.

    This example is a little extreme - throwing 50 messages per second, 5-10K each for about 3-5 seconds - it is considered a too heavy load already?
  • Tobias
    Tobias admin
    edited June 2021
    Yes, this amount of load is too much.
    You want to avoid sending chunks > MTU (1200 bytes which also must contain the headers / type info). Else, the message becomes fragmented and sent reliable.
    If any of the messages fail to send entirely, that will lead to a timeout.

    Please share the Wireshark capture. We would like to look at that before running any exe.
  • What is Wireshark capture?

    The problem is with this kind of load the reliable packets start disappearing long before the timeout. Or there can be no time-out at all if the load is more moderate. Here I've used the extra load just to make sure the problem is 100% reproducible.

    Still for the real game there can be situation when the reliably sent packet is lost and there is no timeout nor any warning/error.
  • ufnv
    ufnv
    edited June 2021
    The trace output looks like this:

    Note the first lost message is nearly two hundred messages before the disconnect finally happened.
    If to stop the extra load right after the first lost message, the disconnect will never happen.

    Also, no need to run the provided exe's, just compile and run the provided code - it's the safest way.
    Connecting...done
    Joining Room...done
      received message 1
      received message 2
      received message 3
      received message 4
      received message 5
      received message 6
      received message 7
      received message 8
      received message 9
      received message 10
      received message 11
      received message 12
      received message 13
      received message 14
      received message 15
      received message 16
      received message 17
      received message 18
      received message 19
      received message 20
      received message 21
      received message 22
      received message 23
      received message 24
      received message 25
      received message 26
      received message 27
      received message 28
      received message 29
      received message 30
      received message 31
      received message 32
      received message 33
      received message 34
      received message 35
      received message 36
      received message 37
      received message 38
      received message 39
      received message 40
      received message 41
      received message 42
      received message 43
      received message 44
      received message 45
      received message 46
      received message 47
      received message 48
      received message 49
      received message 50
      received message 51
      received message 52
      received message 53
      received message 54
      received message 55
      received message 56
      received message 57
      received message 58
      received message 59
      received message 60
      received message 61
      received message 62
      received message 63
      received message 64
      received message 65
      received message 66
      received message 67
      received message 68
      received message 69
      received message 70
      received message 71
      received message 72
      received message 73
      received message 74
      received message 75
      received message 76
      received message 77
      received message 78
      received message 79
      received message 80
      received message 81
      received message 82
      received message 83
      received message 84
      received message 85
      received message 86
      received message 87
      received message 88
      received message 89
      received message 90
      received message 91
      received message 92
      received message 93
      received message 94
      received message 95
      received message 96
      received message 97
      received message 98
      received message 99
      received message 100
      received message 101
      received message 102
      received message 103
      received message 104
      received message 105
      received message 106
      received message 107
      received message 108
      received message 109
      received message 110
      received message 111
      received message 112
      received message 113
      received message 114
      received message 115
      received message 116
      received message 117
      received message 118
      received message 119
      received message 120
      received message 121
      received message 122
      received message 123
      received message 124
      received message 125
      received message 126
      received message 127
      received message 128
      received message 129
      received message 130
      received message 131
      received message 132
      received message 133
      received message 134
      received message 135
      received message 136
      received message 137
      received message 138
      received message 139
      received message 140
      received message 141
      received message 142
      received message 143
      received message 144
      received message 145
      received message 146
      received message 147
      received message 148
      received message 149
      received message 150
      received message 151
      received message 152
      received message 153
      received message 154
      received message 155
      received message 156
      received message 157
      received message 158
      received message 159
      received message 160
      received message 161
      received message 162
      received message 163
      received message 164
      received message 165
      received message 166
      received message 167
      received message 168
      received message 169
      received message 170
      received message 171
      received message 172
      received message 173
      received message 174
      received message 175
      received message 176
      received message 177
      received message 178
      received message 179
      received message 180
      received message 181
      received message 182
      received message 183
      received message 184
      received message 185
      received message 186
      received message 187
      received message 188
      received message 189
      received message 190
      received message 191
      received message 192
      received message 193
      received message 194
      received message 195
      received message 196
      received message 197
      received message 198
      received message 199
      received message 200
      received message 201
      received message 202
      received message 203
      received message 204
      received message 205
      received message 206
      received message 207
      received message 208
      received message 209
      received message 210
      received message 211
      received message 212
      received message 213
      received message 214
      received message 215
      received message 216
      received message 217
      received message 218
      received message 219
      received message 220
      received message 221
      received message 222
      received message 223
      received message 224
      received message 225
      received message 226
      received message 227
      received message 228
      received message 229
      received message 230
      received message 231
      received message 232
      received message 233
      received message 234
      received message 235
      received message 236
      received message 237
      received message 238
      received message 239
      received message 240
      received message 241
      received message 242
      received message 243
      received message 244
      received message 245
      received message 246
      received message 247
      received message 248
      received message 249
      received message 250
      received message 251
      received message 252
      received message 253
      received message 254
      received message 255
      received message 256
      received message 257
      received message 258
      received message 259
      received message 260
      received message 261
      received message 262
      received message 263
      received message 264
      received message 265
      received message 266
      received message 267
      received message 268
      received message 269
      received message 270
      received message 271
      received message 272
      received message 273
      received message 274
      received message 275
      received message 276
      received message 277
      received message 278
      received message 279
      received message 280
      received message 281
      received message 282
      received message 283
      received message 284
      received message 285
      received message 287
    MESSAGE LOST 286!!!
      received message 288
      received message 289
      received message 290
      received message 291
      received message 292
      received message 304
    MESSAGE LOST 293!!!
    MESSAGE LOST 294!!!
    MESSAGE LOST 295!!!
    MESSAGE LOST 296!!!
    MESSAGE LOST 297!!!
    MESSAGE LOST 298!!!
    MESSAGE LOST 299!!!
    MESSAGE LOST 300!!!
    MESSAGE LOST 301!!!
    MESSAGE LOST 302!!!
    MESSAGE LOST 303!!!
      received message 305
      received message 306
      received message 307
      received message 308
      received message 318
    MESSAGE LOST 309!!!
    MESSAGE LOST 310!!!
    MESSAGE LOST 311!!!
    MESSAGE LOST 312!!!
    MESSAGE LOST 313!!!
    MESSAGE LOST 314!!!
    MESSAGE LOST 315!!!
    MESSAGE LOST 316!!!
    MESSAGE LOST 317!!!
      received message 319
      received message 320
      received message 321
      received message 322
      received message 323
      received message 332
    MESSAGE LOST 324!!!
    MESSAGE LOST 325!!!
    MESSAGE LOST 326!!!
    MESSAGE LOST 327!!!
    MESSAGE LOST 328!!!
    MESSAGE LOST 329!!!
    MESSAGE LOST 330!!!
    MESSAGE LOST 331!!!
      received message 333
      received message 334
      received message 335
      received message 336
      received message 337
      received message 344
    MESSAGE LOST 338!!!
    MESSAGE LOST 339!!!
    MESSAGE LOST 340!!!
    MESSAGE LOST 341!!!
    MESSAGE LOST 342!!!
    MESSAGE LOST 343!!!
      received message 345
      received message 346
      received message 347
      received message 348
      received message 349
      received message 358
    MESSAGE LOST 350!!!
    MESSAGE LOST 351!!!
    MESSAGE LOST 352!!!
    MESSAGE LOST 353!!!
    MESSAGE LOST 354!!!
    MESSAGE LOST 355!!!
    MESSAGE LOST 356!!!
    MESSAGE LOST 357!!!
      received message 359
      received message 360
      received message 361
      received message 362
      received message 363
      received message 371
    MESSAGE LOST 364!!!
    MESSAGE LOST 365!!!
    MESSAGE LOST 366!!!
    MESSAGE LOST 367!!!
    MESSAGE LOST 368!!!
    MESSAGE LOST 369!!!
    MESSAGE LOST 370!!!
      received message 372
      received message 373
      received message 374
      received message 375
      received message 384
    MESSAGE LOST 376!!!
    MESSAGE LOST 377!!!
    MESSAGE LOST 378!!!
    MESSAGE LOST 379!!!
    MESSAGE LOST 380!!!
    MESSAGE LOST 381!!!
    MESSAGE LOST 382!!!
    MESSAGE LOST 383!!!
      received message 385
      received message 386
      received message 387
      received message 389
    MESSAGE LOST 388!!!
      received message 391
    MESSAGE LOST 390!!!
      received message 396
    MESSAGE LOST 392!!!
    MESSAGE LOST 393!!!
    MESSAGE LOST 394!!!
    MESSAGE LOST 395!!!
      received message 397
      received message 398
      received message 399
      received message 400
      received message 402
    MESSAGE LOST 401!!!
      received message 405
    MESSAGE LOST 403!!!
    MESSAGE LOST 404!!!
      received message 407
    MESSAGE LOST 406!!!
      received message 409
    MESSAGE LOST 408!!!
      received message 410
      received message 411
      received message 412
      received message 413
      received message 414
      received message 415
      received message 416
      received message 417
      received message 418
      received message 419
      received message 420
      received message 421
      received message 422
      received message 423
      received message 424
      received message 425
      received message 426
      received message 427
      received message 428
      received message 429
      received message 430
      received message 431
      received message 432
      received message 433
      received message 434
      received message 435
    
  • Hi @ufnv.
    What is Wireshark capture?
    See
    https://doc.photonengine.com/en-us/realtime/current/troubleshooting/analyzing-disconnects/#wireshark
    and
    https://www.wireshark.org/#download

    I have had a look into your repro-project and I can reproduce the issue with it. However I have not yet found the time to debug why this is happening.

    I will continue looking into this next week.
  • If you can provide the wireshark captures, I can have a look asap and analyze if there is anything odd with it.
    Ludi is our expert for everything "native", so he has to look into the lib itself.
  • @Tobias Thanks!

    I believe the traffic itself is fine just big at peaks, it's something happening inside, when the traffic overloads the server.

    As far as I understand Ludi was able to reproduce the problem, so I hope the issue will be found soon.
  • Yes, it's a good first step to repro!

    Ludi has a few days off but we'll continue when he's back early next week.

    As said: The capture can help analyze what happens. But if you don't have the time for this, we can do one next week, too.
  • Hi!

    Any news on the issue? @Kaiserludi ?
  • Tobias
    Tobias admin
    edited June 2021
    Ludi reproduced this.
    We see that the server disconnects the clients due to internal buffering limits. You exceed the limits quite frequently with your appid.
    This may look like loss but actually, it is just the precursor for a disconnect. This is correct behaviour, as either side becomes unable to push everything you enqueued.

    You need to reduce the amount of data you send.
  • ufnv
    ufnv
    edited June 2021
    @Tobias , I disagree, this cannot be a correct behaviour. The whole idea of the reliable message is that it is either delivered or "disconnect" happens. In the correct paradigm I could rely on the message is delivered and the only exception I need to handle is the "disconnect".

    However with the existing behaviour we have the messages not delivered and the disconnect happens only after a while. All this time between the message being lost and the occasional disconnect is the "grey zone", where the game can get any kind of exception because of the missing data from the lost message.
    Like, for example, in the message N there is the info on a new unit being created, message N+1 contains the orders for that newly created unit. If the message N is lost, the delivered message N+1 will be referencing the non-existent unit. In the best case this will flag the "resync needed", further increasing the network load, in the worst case this will be an instant crash.

    Also I can easily make it so that the level of load will be "on the edge", where disconnect never happens but the messages are lost.

    So in the current implementation I just cannot rely on the Photon "reliable" messages and need to duplicate the reliability mechanics.

    Would it be fixed, or should I implement my own really reliable messaging?
  • Ludi told me I missed the part where some messages pass but not all. You are right, that should not be possible and we will look into this with high prio. Ludi will update this thread.
  • Thanks, looking forward to the update!
  • Hi @ufnv.

    After doing some debugging of this issue on the client side, to me it looks like a bug in the server code.

    I have supplied our server-devs with a slightly modified version of your repro-product (changed the server URL to our dev-servers, fixed the x64 Debug config project settings and added additional logging) and am now waiting for them to look into the issue.