SIGSEGV: egaes256_init
in Native
Hi,
recently I've got some crashes on Crashlytics:
Right before each and every crash there are some failed authentication operations (we have a custom token-based authentication). I wasn't being able to reproduce the issue. It happens on Android, Realtime SDK v4.1.15.0. Hope you guys might have some ideas.
Thank you,
Alex
recently I've got some crashes on Crashlytics:
Crashed: Thread: SIGSEGV 0x0000000000000000 at egaes256_init + 305(aes256.c:305) at ExitGames::Photon::Internal::Encryption::decrypt(unsigned char const*, int, unsigned char const*, unsigned char**, int*) + 74(Encryption.cpp:74) at ExitGames::Photon::Internal::Encryption::decrypt(unsigned char const*, int, unsigned char const*, unsigned char**, int*) + 74(Encryption.cpp:74) at ExitGames::Photon::Internal::PeerBase::deserializeOperationResponse(unsigned char*, bool, int, unsigned char) + 569(PeerBase.cpp:569) at ExitGames::Photon::Internal::PeerBase::deserializeOperation(unsigned char*, int) + 544(PeerBase.cpp:544) at ExitGames::Photon::Internal::TPeer::dispatchIncomingCommands() + 158(TPeer.cpp:158) at ExitGames::Photon::Internal::PeerBase::service(bool) + 206(PeerBase.cpp:206) at ExitGames::Photon::PhotonPeer::service(bool) + 169(PhotonPeer.cpp:169) at ExitGames::LoadBalancing::Client::service(bool) + 700(Client.cpp:700)
Right before each and every crash there are some failed authentication operations (we have a custom token-based authentication). I wasn't being able to reproduce the issue. It happens on Android, Realtime SDK v4.1.15.0. Hope you guys might have some ideas.
Thank you,
Alex
0
Comments
-
Hi @abelthefirst.
"Right before each and every crash there are some failed authentication operations"
I assume you know this because you have log files that indicate this?
Can you provide the Photon log output?0 -
Hi @Kaiserludi ,
thank you for your quick response.I assume you know this because you have log files that indicate this?
Can you provide the Photon log output?Non-fatal Exception: ******.PhotonManagerImpl$PhotonManagerException: Photon connection error: CUSTOM_AUTHENTICATION_FAILED The token has expired
Thus, for a given user I can see a list of events, but not the complete Photon log output. That would only be possible, if I reproduce the issue locally, which as I mentioned is not the case unfortunately.
Thank you,
Alex0 -
Hi @abelthefirst.CUSTOM_AUTHENTICATION_FAILED The token has expired
That might actually be an extremely helpful piece of information on how to reproduce this issue locally.
We are looking into this. I will update you when I have more information on this issue.0 -
Hi @abelthefirst.
It looks like for some unknown reason the client receives an encrypted message from the server in a state in which encryption is not initialized on the client yet. This should not actually be possible as the server should not send encrypted messages until encryption has been established.
Unfortunately the current version of the client relies on this no being possible and hence when it receives an encrypted message from the server, it assumes that encryption has been established and blindly attempts to decrypt the messages, without verifying first that encryption actually has been established. This leads to the client attempting to de-reference a NULL pointer. We will add a check in the client that encryption has actually been established before attempting to decrypt in the next release.
However this does not explain WHY the server is sending an encrypted messages in this state.
I have checked the Photon Cloud admin panel for the company that I think you belong to and could see that you are running custom server side code via the Photon Enterprise Cloud Plugin interface.
Could it be the case that your custom server side code sends encrypted messages to the client?
Maybe it somehow gets into a state in which it thinks that encryption has been established, but either encryption has not been established yet, or the connection is already getting cleaned up and for an unknown reason there is still an encrypted message getting sent, when the client is no in a state anymore to decrypt it.
Both should not actually been possible, but I don't see another way how this could happen.
I think you need to find a way to reproduce this issue for you/us to be able to track down the cause.0 -
Hi @Kaiserludi ,
I work closely with the backend team right now, but we didn't find anything particularly weird just yet.
In the meantime I was doing a little bit more debugging on the client side, while my authentication token is expired. As far as I can tell (please correct me if I'm wrong), it's a disconnect call which destroys an established encryption. The only encrypted incoming message, that I encountered, was an "GET_REGIONS"" operation result. Since disconnect might be initiated by the server, is it a possibility, that the "incoming queue" contains both "server disconnect" AND "get regions" messages? In that case first message might cause encryption destruction and second message will actually crash the client.
Again, that is not something I was able to reproduce, just some assumptions.
Thank you,
Alex0 -
Hi @abelthefirst.As far as I can tell (please correct me if I'm wrong), it's a disconnect call which destroys an established encryption.Since disconnect might be initiated by the server, is it a possibility, that the "incoming queue" contains both "server disconnect" AND "get regions" messages? In that case first message might cause encryption destruction and second message will actually crash the client.
Neither will the server send any more messages to client after having sent it a disconnect, nor will the client process any more incoming messages on that connection once it has received the disconnect.0 -
Hi @Kaiserludi ,
it's been a while and unfortunately we were still unable to identify the exact cause of the issue. However I've added some extra logging to our Java layer, which interacts with the Photon client, and obtained some interesting results. I'll post last parts of some of the logs below:455 09:07:07.532 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 36065556, Thread name - main 456 09:07:07.532 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 36065556, Thread name - main 457 09:07:07.533 p.m. D/CrashlyticsCore PhotonManagerImpl: connectReturn with error = CUSTOM_AUTHENTICATION_FAILED, error text = The user did not give consent for live opponents, PhotonWrapper instance - 36065556, Thread name - main 458 09:07:07.533 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 36065556, Thread name - main 459 09:07:07.536 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 36065556, Thread name - main 460 09:07:08.150 p.m. D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 36065556, Thread name - main 461 09:07:08.150 p.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 36065556, Thread name - main 462 09:07:08.150 p.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 36065556, Thread name - main 463 09:07:08.150 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 36065556, Thread name - main 464 09:07:08.155 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 36065556, Thread name - main
407 07:13:49.799 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main 408 07:13:49.808 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main 409 07:13:50.651 p.m. D/CrashlyticsCore PhotonManagerImpl: onAvailableRegions, PhotonWrapper instance - 166427881, Thread name - main 410 07:13:51.487 p.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 166427881, Thread name - main 411 07:13:51.488 p.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 166427881, Thread name - main 412 07:13:51.488 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main 413 07:13:51.551 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main 414 07:13:51.553 p.m. D/CrashlyticsCore PhotonManagerImpl: connectReturn with error = CUSTOM_AUTHENTICATION_FAILED, error text = The token has expired, PhotonWrapper instance - 166427881, Thread name - main 415 07:13:51.553 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main 416 07:13:51.568 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main 417 07:14:29.271 p.m. D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 166427881, Thread name - main 418 07:14:29.278 p.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 166427881, Thread name - main 419 07:14:29.278 p.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 166427881, Thread name - main 420 07:14:29.278 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 166427881, Thread name - main 421 07:14:29.288 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 166427881, Thread name - main
20 07:55:51.033 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 170633409, Thread name - main 21 07:55:51.038 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 170633409, Thread name - main ..... 32 07:55:58.643 p.m. D/CrashlyticsCore PhotonManagerImpl: onAvailableRegions, PhotonWrapper instance - 170633409, Thread name - main 33 07:55:59.538 p.m. D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 170633409, Thread name - main 34 07:55:59.538 p.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 170633409, Thread name - main 35 07:55:59.538 p.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 170633409, Thread name - main 36 07:55:59.538 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 170633409, Thread name - main 37 07:55:59.539 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 170633409, Thread name - main 38 07:56:07.139 p.m. D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION, PhotonWrapper instance - 170633409, Thread name - main 39 07:56:07.139 p.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn, PhotonWrapper instance - 170633409, Thread name - main 40 07:56:07.140 p.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED, PhotonWrapper instance - 170633409, Thread name - main 41 07:56:07.140 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, PhotonWrapper instance - 170633409, Thread name - main 42 07:56:07.141 p.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD, PhotonWrapper instance - 170633409, Thread name - main
As you might see, it is always INTERNAL_RECEIVE_EXCEPTION, that causes the crash. The weird fact is that sometimes a client handles it just fine, but sometimes it fails, which really looks like a race condition.0 -
Hi @abelthefirst.
In all your logs disconnectReturn() gets called after the internal receive connection got logged and only after that has happened, the log ends shortly after.
I could imagine that there is a problem with any cleanup code that you might run once you know about the disconnect.0 -
Hi @Kaiserludi ,
I've added even more logging to the latest release and need your help to understand the data I get. As of now there is only single log, I'll add more when it becomes available:1 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,661569 DEBUG TPeer.cpp onReceiveData() 195 length = 50, error = 0 2 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,661704 DEBUG TPeer.cpp onReceiveData() 195 length = 0, error = -1 3 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = INTERNAL_RECEIVE_EXCEPTION 4 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,661837 DEBUG PeerBase.cpp onDisconnected() 449 5 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn 6 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED 7 08:32:23.051 a.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded 8 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD 9 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674308 INFO PeerBase.cpp connect() 128 address: ns.exitgamescloud.com:4533 10 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674387 DEBUG PeerBase.cpp reset() 111 11 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674650 DEBUG PeerBase.cpp service() 203 dispatch == true 12 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674718 DEBUG TPeer.cpp dispatchIncomingCommands() 151 13 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674756 DEBUG PeerBase.cpp deserializeOperation() 513 14 08:32:23.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 09:32:23,674790 DEBUG PeerBase.cpp deserializeOperation() 532 bodyBuff: 50, msgType: 3 (event = 4)
0 -
Hi @abelthefirst.
1
08:32:23.051 a.m.
The client receives 50 bytes of data.
2
08:32:23.051 a.m.
The sockets recv() function returns an error.
3
08:32:23.051 a.m.
The client reports 'INTERNAL_RECEIVE_EXCEPTION' to connectionErrorReturn().
4
08:32:23.051 a.m.
The client sets itself into the disconnected state...
5
08:32:23.051 a.m.
...and reports a disconnect to disconnectReturn().
6
08:32:23.051 a.m.
The app sets it's PhotonState to 'DISCONNECTED'.
7
08:32:23.051 a.m.
The app calls connectIfNeeded().
8
08:32:23.064 a.m.
connectIfNeeded() gets called with hosting type = CLOUD
9
08:32:23.064 a.m.
The client connects to Photons nameserver over TCP.
10
08:32:23.064 a.m.
The client resets its Peer (which handles the socket connection) to a clean state for the fresh connection ( this is normal).
11
08:32:23.064 a.m.
service() is called.
12
08:32:23.064 a.m.
dispatchIncomingCommands() is called by the service() call.
13
08:32:23.064 a.m.
deserializeOperation() is called by the dispatchIncomingCommands() call.
14
08:32:23.064 a.m.
deserializeOperation() deserializes a 50 bytes messages of type 'operationResponse'. The first operation response that you will receive after calling connect() is the 'getRegions' response.
At that point the code should call Client::onOperationResponse(), but apparently it never reaches that point as the EGLOG() call from the first line of that function does not show up in your log.
0 -
Well, thank you for such a detailed explanation, but that is not what I meant Client::onOperationResponse() is not called, since egaes256_init crash occurred. As far, as I understand,
bodyBuff: 50, msgType: 3 (event = 4)
is an encrypted onAvailableRegions response, but it seems it should not be obtained yet, since the connection to the nameserver is not yet established. So the question is, how that might happen?0 -
Which RegionSelectionMode are you using? In RegionSelectionMode::SELCECT your onRegionsAvailable() callback will be called. Could it be possible that it crashes in there?
Also it could potentially be related to your custom authentication, as sending the authentication operation request would be the next step after the getRegions operation response has been received. Maybe there is a problem with your custom auth parameters that only happens when on reconnects? i.e. a dangling pointer to no longer valid memory or something like that?
Finally it might be worth to check if the problem still occurs if you turn off encryption for the opGetRegions() call (search for mpPeer->opGetRegions(true, mAppID); in Client.cpp and replace the 'true' with 'false'). If that makes the issue disappear entirely, that would indicate that the crash would happen somewhere inside the decryption code inside Photon.
Any progress in reproducing the crash locally? I am afraid that you really need to figure out what the users do that provokes this crash, so that you can figure out a way to reproduce and debug it.0 -
Hi @abelthefirst.
Wait, do you mean that you have not shortened the log and you get that op response directly after the connect with any other outgoing or incoming data showing up in between?0 -
- Unfortunately I still cannot reproduce the issue locally.
- Yes, that is a complete piece of log, op response happens right after connect request.
0 -
Are you by any change accessing the same Client instance from multiple threads in parallel? Doing that is not supported and causes undefined behavior.0
-
That was my first assumption, but no, interaction with the client happens on a single thread.
0 -
Another log, that I obtained just today. Again, this is a complete log.
1 01:46:14.861 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,625187 DEBUG TPeer.cpp onReceiveData() 195 length = 110, error = 0 2 01:46:14.862 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,625888 DEBUG PeerBase.cpp service() 203 dispatch == true 3 01:46:14.863 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,626459 DEBUG TPeer.cpp dispatchIncomingCommands() 151 4 01:46:14.863 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,627081 DEBUG PeerBase.cpp deserializeOperation() 513 5 01:46:14.864 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,627575 DEBUG PeerBase.cpp deserializeOperation() 532 bodyBuff: 110, msgType: 7 (event = 4) 6 01:46:14.867 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,631275 DEBUG PeerBase.cpp serializeOperation() 461 7 01:46:14.877 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,641355 DEBUG TPeer.cpp send() 253 cType: 6 payloadSize: 57 8 01:46:14.878 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,642139 DEBUG TPeer.cpp dispatchIncomingCommands() 151 9 01:46:14.879 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,642638 DEBUG TPeer.cpp sendOutgoingCommands() 83 10 01:46:14.960 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,724297 DEBUG PeerBase.cpp onConnect() 419 11 01:46:14.960 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,724585 ERROR PeerBase.cpp onConnect() 422 4294967295 12 01:46:14.960 a.m. D/CrashlyticsCore PhotonManagerImpl: connectionErrorReturn, statusCode = EXCEPTION_ON_CONNECT 13 01:46:14.961 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,724890 DEBUG PeerBase.cpp onDisconnected() 449 14 01:46:14.961 a.m. D/CrashlyticsCore PhotonManagerImpl: disconnectReturn 15 01:46:14.961 a.m. D/CrashlyticsCore PhotonManagerImpl: setPhotonState, state = DISCONNECTED 16 01:46:14.961 a.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded 17 01:46:14.962 a.m. D/CrashlyticsCore PhotonManagerImpl: connectIfNeeded, hosting type = CLOUD 18 01:46:14.962 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,726650 INFO PeerBase.cpp connect() 128 address: ns.exitgamescloud.com:4533 19 01:46:14.963 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,726890 DEBUG PeerBase.cpp reset() 111 20 01:46:14.963 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,727288 DEBUG PeerBase.cpp service() 203 dispatch == true 21 01:46:14.963 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,727667 DEBUG TPeer.cpp dispatchIncomingCommands() 151 22 01:46:14.963 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,727828 DEBUG TPeer.cpp sendOutgoingCommands() 83 23 01:46:15.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,828327 DEBUG PeerBase.cpp service() 203 dispatch == true 24 01:46:15.064 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,828697 DEBUG TPeer.cpp dispatchIncomingCommands() 151 25 01:46:15.065 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,828935 DEBUG TPeer.cpp sendOutgoingCommands() 83 26 01:46:15.163 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,927513 DEBUG PeerBase.cpp onConnect() 419 27 01:46:15.164 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,927880 DEBUG TPeer.cpp send() 253 cType: 1 payloadSize: 48 28 01:46:15.164 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928128 DEBUG TPeer.cpp sendOutgoingCommands() 83 29 01:46:15.164 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928363 DEBUG TPeer.cpp onReceiveData() 195 length = 50, error = 0 30 01:46:15.164 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928597 DEBUG PeerBase.cpp service() 203 dispatch == true 31 01:46:15.165 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,928897 DEBUG TPeer.cpp dispatchIncomingCommands() 151 32 01:46:15.165 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,929216 DEBUG PeerBase.cpp deserializeOperation() 513 33 01:46:15.165 a.m. D/CrashlyticsCore PhotonManagerImpl: 2020-02-27 16:46:15,929485 DEBUG PeerBase.cpp deserializeOperation() 532 bodyBuff: 50, msgType: 3 (event = 4)
0 -
Hi @abelthefirst.That was my first assumption, but no, interaction with the client happens on a single thread.
This type of bug can often cause really unpredictable crashes in entirely unrelated code.
Hence the cause of the bug might not even be in any code that is even loosely related to Photon.
All dynamic memory allocations by Photon happen through allocate(), allocateArray(), reallocate(), reallocateArray(), EG_MALLOC(), EG_REALLOC() and EG_CALLOC() and all deallocations happen through deallocate(), deallocateArray() and EG_FREE().
These all sit on top of class AllocatorInterface
Hence the actual allocator can be switched out against a custom one i.e. for performance or debugging needs. The API reference for AllocatorInterface::get() contains some example implementations, which can also be found in the comments in AllocatorInterface.h, from which this part of the API-reference gets generated.
The default allocator uses a pool-based memory management, which gives better performance for than direct calls to new or malloc.
However it has the side-effect that identifying memory issues like leaks or memory corruption can be more difficult (i.e. some code allocates 12 bytes, but under the hood 16 bytes are allocated, so an out-of bounds access at indices 12-15 might stay undetected and it only crashes in the case that you access even further out of bounds, which might be so rare that they are hard to reproduce; another example: some memory gets deallocated, but still accessed through a dangling pointer afterwards, but the memory has not been given out again to the next user, so it might still hold valid data, but this data might suddenly become overwritten at some completely unrelated place in the code).
For this reason when facing potential memory issues it can make sense to provide your own custom allocator that just adds some debug-logging and then directly forwards the allocation/deallocation to malloc()/free(). That way such issues may become easier reproducible. Be aware though that logging every single dynamic memory allocation might degrade your apps performance a lot.
Please see https://doc-api.photonengine.com/en/cpp/current/a04792.html for the ExitGames::Common::MemoryManagement Namespace Reference and https://doc-api.photonengine.com/en/cpp/current/a05378.html for the AllocatorInterface Class Reference.0 -
Hi @Kaiserludi ,
I agree with your point in general, but I don't quite see the scenario, when incorrect memory management from my side can trigger rare, but stable specific encrypted message arrival in an incoming queue.
I'll try to disable encryption in the next release and see, how it goes.0 -
Hi @abelthefirst.
I have compared your to logs with each other to see what they have in common and how they differ and tried to simulate the situation.
When doing so I stumbled upon a bug in the clients TCP implementation that is likely the culprit of the issue:
When the client experienced a TCP connection error and in consequence sets itself into the disconnected state, then it does not clear its queues and after a reconnect it can potentially still hold entries from before the disconnect in its queues.
After the reconnect the client then tries to process the incoming queue entries which are leftover from the previous connection, assuming that they have arrived on the new connection. When such an entry happens to be an encrypted message, then the client crashes as it does not expect to receive encrypted messages before encryption has been established.
I have fixed this and the fix will be part of the 4.1.15.4 release which should become available until end of this week or early next week.
Please update to that version once it becomes available and observe if this issue still gets reported for that version.0 -
0
-
Hi @abelthefirst.
4.1.16.0, which contains the mentioned fix, has just become available in our download section.1 -
0