SIGSEGV: egaes256_init

Hi,

recently I've got some crashes on Crashlytics:
Crashed: Thread: SIGSEGV  0x0000000000000000
       at egaes256_init + 305(aes256.c:305)
       at ExitGames::Photon::Internal::Encryption::decrypt(unsigned char const*, int, unsigned char const*, unsigned char**, int*) + 74(Encryption.cpp:74)
       at ExitGames::Photon::Internal::Encryption::decrypt(unsigned char const*, int, unsigned char const*, unsigned char**, int*) + 74(Encryption.cpp:74)
       at ExitGames::Photon::Internal::PeerBase::deserializeOperationResponse(unsigned char*, bool, int, unsigned char) + 569(PeerBase.cpp:569)
       at ExitGames::Photon::Internal::PeerBase::deserializeOperation(unsigned char*, int) + 544(PeerBase.cpp:544)
       at ExitGames::Photon::Internal::TPeer::dispatchIncomingCommands() + 158(TPeer.cpp:158)
       at ExitGames::Photon::Internal::PeerBase::service(bool) + 206(PeerBase.cpp:206)
       at ExitGames::Photon::PhotonPeer::service(bool) + 169(PhotonPeer.cpp:169)
       at ExitGames::LoadBalancing::Client::service(bool) + 700(Client.cpp:700)

Right before each and every crash there are some failed authentication operations (we have a custom token-based authentication). I wasn't being able to reproduce the issue. It happens on Android, Realtime SDK v4.1.15.0. Hope you guys might have some ideas.

Thank you,
Alex

Comments

  • Hi @abelthefirst.

    "Right before each and every crash there are some failed authentication operations"
    I assume you know this because you have log files that indicate this?
    Can you provide the Photon log output?
  • Hi @Kaiserludi ,

    thank you for your quick response.
    I assume you know this because you have log files that indicate this?
    Can you provide the Photon log output?
    That's not entirely the case. Every failed authentication operation is reported to the Crashlytics, as a non-fatal error, like this:
    Non-fatal Exception: ******.PhotonManagerImpl$PhotonManagerException: Photon connection error: CUSTOM_AUTHENTICATION_FAILED The token has expired
    
    Thus, for a given user I can see a list of events, but not the complete Photon log output. That would only be possible, if I reproduce the issue locally, which as I mentioned is not the case unfortunately.

    Thank you,
    Alex
  • Hi @abelthefirst.
    CUSTOM_AUTHENTICATION_FAILED The token has expired

    That might actually be an extremely helpful piece of information on how to reproduce this issue locally.

    We are looking into this. I will update you when I have more information on this issue.
  • Hi @abelthefirst.

    It looks like for some unknown reason the client receives an encrypted message from the server in a state in which encryption is not initialized on the client yet. This should not actually be possible as the server should not send encrypted messages until encryption has been established.
    Unfortunately the current version of the client relies on this no being possible and hence when it receives an encrypted message from the server, it assumes that encryption has been established and blindly attempts to decrypt the messages, without verifying first that encryption actually has been established. This leads to the client attempting to de-reference a NULL pointer. We will add a check in the client that encryption has actually been established before attempting to decrypt in the next release.

    However this does not explain WHY the server is sending an encrypted messages in this state.

    I have checked the Photon Cloud admin panel for the company that I think you belong to and could see that you are running custom server side code via the Photon Enterprise Cloud Plugin interface.

    Could it be the case that your custom server side code sends encrypted messages to the client?
    Maybe it somehow gets into a state in which it thinks that encryption has been established, but either encryption has not been established yet, or the connection is already getting cleaned up and for an unknown reason there is still an encrypted message getting sent, when the client is no in a state anymore to decrypt it.
    Both should not actually been possible, but I don't see another way how this could happen.

    I think you need to find a way to reproduce this issue for you/us to be able to track down the cause.
  • abelthefirst
    edited January 2020
    Hi @Kaiserludi ,

    I work closely with the backend team right now, but we didn't find anything particularly weird just yet.

    In the meantime I was doing a little bit more debugging on the client side, while my authentication token is expired. As far as I can tell (please correct me if I'm wrong), it's a disconnect call which destroys an established encryption. The only encrypted incoming message, that I encountered, was an "GET_REGIONS"" operation result. Since disconnect might be initiated by the server, is it a possibility, that the "incoming queue" contains both "server disconnect" AND "get regions" messages? In that case first message might cause encryption destruction and second message will actually crash the client.

    Again, that is not something I was able to reproduce, just some assumptions.

    Thank you,
    Alex
  • Hi @abelthefirst.
    As far as I can tell (please correct me if I'm wrong), it's a disconnect call which destroys an established encryption.
    Note that under the hood the client disconnects from the name server when it connects to a regions master server, disconnects from a master server, when it enters a room and disconnects from a game server when it leaves a room. Whenever the client connects to a different server under the hood, it needs to reestablish the encryption, as that new server does not have a matching key for the encryption that the client has established to the server to which it was previously connected.
    Since disconnect might be initiated by the server, is it a possibility, that the "incoming queue" contains both "server disconnect" AND "get regions" messages? In that case first message might cause encryption destruction and second message will actually crash the client.
    In that case the servers message will have been sent before the disconnect and hence get processed by the client before the disconnect, when encryption still exists.
    Neither will the server send any more messages to client after having sent it a disconnect, nor will the client process any more incoming messages on that connection once it has received the disconnect.
  • Hi @Kaiserludi ,

    it's been a while and unfortunately we were still unable to identify the exact cause of the issue. However I've added some extra logging to our Java layer, which interacts with the Photon client, and obtained some interesting results. I'll post last parts of some of the logs below:

    455
    09:07:07.532 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 36065556, Thread name - main
    456
    09:07:07.532 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 36065556, Thread name - main
    457
    09:07:07.533 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectReturn with error = CUSTOM_AUTHENTICATION_FAILED, error text = The user did not give consent for live opponents, PhotonWrapper instance - 36065556, Thread name - main
    458
    09:07:07.533 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 36065556, Thread name - main
    459
    09:07:07.536 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 36065556, Thread name - main
    460
    09:07:08.150 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 36065556, Thread name - main
    461
    09:07:08.150 p.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 36065556, Thread name - main
    462
    09:07:08.150 p.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 36065556, Thread name - main
    463
    09:07:08.150 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 36065556, Thread name - main
    464
    09:07:08.155 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 36065556, Thread name - main
    

    407
    07:13:49.799 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main
    408
    07:13:49.808 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main
    409
    07:13:50.651 p.m.
    D/CrashlyticsCore PhotonManagerImpl: onAvailableRegions, PhotonWrapper instance - 166427881, Thread name - main
    410
    07:13:51.487 p.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 166427881, Thread name - main
    411
    07:13:51.488 p.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 166427881, Thread name - main
    412
    07:13:51.488 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main
    413
    07:13:51.551 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main
    414
    07:13:51.553 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectReturn with error = CUSTOM_AUTHENTICATION_FAILED, error text = The token has expired, PhotonWrapper instance - 166427881, Thread name - main
    415
    07:13:51.553 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main
    416
    07:13:51.568 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main
    417
    07:14:29.271 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 166427881, Thread name - main
    418
    07:14:29.278 p.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 166427881, Thread name - main
    419
    07:14:29.278 p.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 166427881, Thread name - main
    420
    07:14:29.278 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main
    421
    07:14:29.288 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main
    

    20
    07:55:51.033 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 170633409, Thread name - main
    21
    07:55:51.038 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 170633409, Thread name - main
    .....
    32
    07:55:58.643 p.m.
    D/CrashlyticsCore PhotonManagerImpl: onAvailableRegions, PhotonWrapper instance - 170633409, Thread name - main
    33
    07:55:59.538 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 170633409, Thread name - main
    34
    07:55:59.538 p.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 170633409, Thread name - main
    35
    07:55:59.538 p.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 170633409, Thread name - main
    36
    07:55:59.538 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 170633409, Thread name - main
    37
    07:55:59.539 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 170633409, Thread name - main
    38
    07:56:07.139 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 170633409, Thread name - main
    39
    07:56:07.139 p.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 170633409, Thread name - main
    40
    07:56:07.140 p.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 170633409, Thread name - main
    41
    07:56:07.140 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 170633409, Thread name - main
    42
    07:56:07.141 p.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 170633409, Thread name - main
    


    As you might see, it is always INTERNAL_RECEIVE_EXCEPTION, that causes the crash. The weird fact is that sometimes a client handles it just fine, but sometimes it fails, which really looks like a race condition.
  • Hi @abelthefirst.

    In all your logs disconnectReturn() gets called after the internal receive connection got logged and only after that has happened, the log ends shortly after.
    I could imagine that there is a problem with any cleanup code that you might run once you know about the disconnect.
  • Hi @Kaiserludi ,

    I've added even more logging to the latest release and need your help to understand the data I get. As of now there is only single log, I'll add more when it becomes available:
    1
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,661569 DEBUG   TPeer.cpp                      onReceiveData()                                              195   length = 50, error = 0
    2
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,661704 DEBUG   TPeer.cpp                      onReceiveData()                                              195   length = 0, error = -1
    3
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION
    4
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,661837 DEBUG   PeerBase.cpp                   onDisconnected()                                             449
    5
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn
    6
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED
    7
    08:32:23.051 a.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded
    8
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD
    9
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674308 INFO    PeerBase.cpp                   connect()                                                    128   address: ns.exitgamescloud.com:4533
    10
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674387 DEBUG   PeerBase.cpp                   reset()                                                      111
    11
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674650 DEBUG   PeerBase.cpp                   service()                                                    203   dispatch == true
    12
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674718 DEBUG   TPeer.cpp                      dispatchIncomingCommands()                                   151
    13
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674756 DEBUG   PeerBase.cpp                   deserializeOperation()                                       513
    14
    08:32:23.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674790 DEBUG   PeerBase.cpp                   deserializeOperation()                                       532   bodyBuff: 50, msgType: 3 (event = 4)
    
  • Hi @abelthefirst.

    1
    08:32:23.051 a.m.
    The client receives 50 bytes of data.
    2
    08:32:23.051 a.m.
    The sockets recv() function returns an error.
    3
    08:32:23.051 a.m.
    The client reports 'INTERNAL_RECEIVE_EXCEPTION' to connectionErrorReturn().
    4
    08:32:23.051 a.m.
    The client sets itself into the disconnected state...
    5
    08:32:23.051 a.m.
    ...and reports a disconnect to disconnectReturn().
    6
    08:32:23.051 a.m.
    The app sets it's PhotonState to 'DISCONNECTED'.
    7
    08:32:23.051 a.m.
    The app calls connectIfNeeded().
    8
    08:32:23.064 a.m.
    connectIfNeeded() gets called with hosting type = CLOUD
    9
    08:32:23.064 a.m.
    The client connects to Photons nameserver over TCP.
    10
    08:32:23.064 a.m.
    The client resets its Peer (which handles the socket connection) to a clean state for the fresh connection ( this is normal).
    11
    08:32:23.064 a.m.
    service() is called.
    12
    08:32:23.064 a.m.
    dispatchIncomingCommands() is called by the service() call.
    13
    08:32:23.064 a.m.
    deserializeOperation() is called by the dispatchIncomingCommands() call.
    14
    08:32:23.064 a.m.
    deserializeOperation() deserializes a 50 bytes messages of type 'operationResponse'. The first operation response that you will receive after calling connect() is the 'getRegions' response.

    At that point the code should call Client::onOperationResponse(), but apparently it never reaches that point as the EGLOG() call from the first line of that function does not show up in your log.

  • Well, thank you for such a detailed explanation, but that is not what I meant :) Client::onOperationResponse() is not called, since egaes256_init crash occurred. As far, as I understand,
    bodyBuff: 50, msgType: 3 (event = 4)
    
    is an encrypted onAvailableRegions response, but it seems it should not be obtained yet, since the connection to the nameserver is not yet established. So the question is, how that might happen?
  • Which RegionSelectionMode are you using? In RegionSelectionMode::SELCECT your onRegionsAvailable() callback will be called. Could it be possible that it crashes in there?

    Also it could potentially be related to your custom authentication, as sending the authentication operation request would be the next step after the getRegions operation response has been received. Maybe there is a problem with your custom auth parameters that only happens when on reconnects? i.e. a dangling pointer to no longer valid memory or something like that?

    Finally it might be worth to check if the problem still occurs if you turn off encryption for the opGetRegions() call (search for mpPeer->opGetRegions(true, mAppID); in Client.cpp and replace the 'true' with 'false'). If that makes the issue disappear entirely, that would indicate that the crash would happen somewhere inside the decryption code inside Photon.

    Any progress in reproducing the crash locally? I am afraid that you really need to figure out what the users do that provokes this crash, so that you can figure out a way to reproduce and debug it.
  • Hi @abelthefirst.

    Wait, do you mean that you have not shortened the log and you get that op response directly after the connect with any other outgoing or incoming data showing up in between?
    1. Unfortunately I still cannot reproduce the issue locally.
    2. Yes, that is a complete piece of log, op response happens right after connect request.
  • Are you by any change accessing the same Client instance from multiple threads in parallel? Doing that is not supported and causes undefined behavior.
  • That was my first assumption, but no, interaction with the client happens on a single thread.

  • Another log, that I obtained just today. Again, this is a complete log.
    1
    01:46:14.861 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,625187 DEBUG   TPeer.cpp                      onReceiveData()                                              195   length = 110, error = 0
    2
    01:46:14.862 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,625888 DEBUG   PeerBase.cpp                   service()                                                    203   dispatch == true
    3
    01:46:14.863 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,626459 DEBUG   TPeer.cpp                      dispatchIncomingCommands()                                   151
    4
    01:46:14.863 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,627081 DEBUG   PeerBase.cpp                   deserializeOperation()                                       513
    5
    01:46:14.864 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,627575 DEBUG   PeerBase.cpp                   deserializeOperation()                                       532   bodyBuff: 110, msgType: 7 (event = 4)
    6
    01:46:14.867 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,631275 DEBUG   PeerBase.cpp                   serializeOperation()                                         461
    7
    01:46:14.877 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,641355 DEBUG   TPeer.cpp                      send()                                                       253   cType: 6 payloadSize: 57
    8
    01:46:14.878 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,642139 DEBUG   TPeer.cpp                      dispatchIncomingCommands()                                   151
    9
    01:46:14.879 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,642638 DEBUG   TPeer.cpp                      sendOutgoingCommands()                                       83
    10
    01:46:14.960 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,724297 DEBUG   PeerBase.cpp                   onConnect()                                                  419
    11
    01:46:14.960 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,724585 ERROR   PeerBase.cpp                   onConnect()                                                  422   4294967295
    12
    01:46:14.960 a.m.
    D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = EXCEPTION_ON_CONNECT
    13
    01:46:14.961 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,724890 DEBUG   PeerBase.cpp                   onDisconnected()                                             449
    14
    01:46:14.961 a.m.
    D/CrashlyticsCore PhotonManagerImpl: disconnectReturn
    15
    01:46:14.961 a.m.
    D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED
    16
    01:46:14.961 a.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded
    17
    01:46:14.962 a.m.
    D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD
    18
    01:46:14.962 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,726650 INFO    PeerBase.cpp                   connect()                                                    128   address: ns.exitgamescloud.com:4533
    19
    01:46:14.963 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,726890 DEBUG   PeerBase.cpp                   reset()                                                      111
    20
    01:46:14.963 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,727288 DEBUG   PeerBase.cpp                   service()                                                    203   dispatch == true
    21
    01:46:14.963 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,727667 DEBUG   TPeer.cpp                      dispatchIncomingCommands()                                   151
    22
    01:46:14.963 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,727828 DEBUG   TPeer.cpp                      sendOutgoingCommands()                                       83
    23
    01:46:15.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,828327 DEBUG   PeerBase.cpp                   service()                                                    203   dispatch == true
    24
    01:46:15.064 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,828697 DEBUG   TPeer.cpp                      dispatchIncomingCommands()                                   151
    25
    01:46:15.065 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,828935 DEBUG   TPeer.cpp                      sendOutgoingCommands()                                       83
    26
    01:46:15.163 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,927513 DEBUG   PeerBase.cpp                   onConnect()                                                  419
    27
    01:46:15.164 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,927880 DEBUG   TPeer.cpp                      send()                                                       253   cType: 1 payloadSize: 48
    28
    01:46:15.164 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928128 DEBUG   TPeer.cpp                      sendOutgoingCommands()                                       83
    29
    01:46:15.164 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928363 DEBUG   TPeer.cpp                      onReceiveData()                                              195   length = 50, error = 0
    30
    01:46:15.164 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928597 DEBUG   PeerBase.cpp                   service()                                                    203   dispatch == true
    31
    01:46:15.165 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928897 DEBUG   TPeer.cpp                      dispatchIncomingCommands()                                   151
    32
    01:46:15.165 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,929216 DEBUG   PeerBase.cpp                   deserializeOperation()                                       513
    33
    01:46:15.165 a.m.
    D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,929485 DEBUG   PeerBase.cpp                   deserializeOperation()                                       532   bodyBuff: 50, msgType: 3 (event = 4)
    
  • Hi @abelthefirst.
    That was my first assumption, but no, interaction with the client happens on a single thread.
    The most likely cause then probably is a memory corruption. Maybe your code somewhere has a double-deallocation, an allocator vs deallocator mismatch (i.e. mixing new with delete[] or new[] with free) or does an out of boundaries write-access.
    This type of bug can often cause really unpredictable crashes in entirely unrelated code.
    Hence the cause of the bug might not even be in any code that is even loosely related to Photon.

    All dynamic memory allocations by Photon happen through allocate(), allocateArray(), reallocate(), reallocateArray(), EG_MALLOC(), EG_REALLOC() and EG_CALLOC() and all deallocations happen through deallocate(), deallocateArray() and EG_FREE().
    These all sit on top of class AllocatorInterface

    Hence the actual allocator can be switched out against a custom one i.e. for performance or debugging needs. The API reference for AllocatorInterface::get() contains some example implementations, which can also be found in the comments in AllocatorInterface.h, from which this part of the API-reference gets generated.
    The default allocator uses a pool-based memory management, which gives better performance for than direct calls to new or malloc.

    However it has the side-effect that identifying memory issues like leaks or memory corruption can be more difficult (i.e. some code allocates 12 bytes, but under the hood 16 bytes are allocated, so an out-of bounds access at indices 12-15 might stay undetected and it only crashes in the case that you access even further out of bounds, which might be so rare that they are hard to reproduce; another example: some memory gets deallocated, but still accessed through a dangling pointer afterwards, but the memory has not been given out again to the next user, so it might still hold valid data, but this data might suddenly become overwritten at some completely unrelated place in the code).

    For this reason when facing potential memory issues it can make sense to provide your own custom allocator that just adds some debug-logging and then directly forwards the allocation/deallocation to malloc()/free(). That way such issues may become easier reproducible. Be aware though that logging every single dynamic memory allocation might degrade your apps performance a lot.

    Please see https://doc-api.photonengine.com/en/cpp/current/a04792.html for the ExitGames::Common::MemoryManagement Namespace Reference and https://doc-api.photonengine.com/en/cpp/current/a05378.html for the AllocatorInterface Class Reference.
  • Hi @Kaiserludi ,

    I agree with your point in general, but I don't quite see the scenario, when incorrect memory management from my side can trigger rare, but stable specific encrypted message arrival in an incoming queue.
    I'll try to disable encryption in the next release and see, how it goes.

  • Hi @abelthefirst.

    I have compared your to logs with each other to see what they have in common and how they differ and tried to simulate the situation.
    When doing so I stumbled upon a bug in the clients TCP implementation that is likely the culprit of the issue:
    When the client experienced a TCP connection error and in consequence sets itself into the disconnected state, then it does not clear its queues and after a reconnect it can potentially still hold entries from before the disconnect in its queues.
    After the reconnect the client then tries to process the incoming queue entries which are leftover from the previous connection, assuming that they have arrived on the new connection. When such an entry happens to be an encrypted message, then the client crashes as it does not expect to receive encrypted messages before encryption has been established.
    I have fixed this and the fix will be part of the 4.1.15.4 release which should become available until end of this week or early next week.
    Please update to that version once it becomes available and observe if this issue still gets reported for that version.
  • @Kaiserludi ,

    hallelujah! Thank you.
  • Hi @abelthefirst.

    4.1.16.0, which contains the mentioned fix, has just become available in our download section.
  • Hi @Kaiserludi ,

    we released a new version around a week ago. The issue is gone. Thank you!