Android c++ ndk SDK crash

JoH
JoH
edited October 2015 in Native
Hi

I got my cocos2d-x game online, but received this crash report.
SDK : Photon-AndroidNDK-Sdk_v4-0-4-1

So far no such crash happened in iOS.

------
0 libcocos2dcpp.so 0x7f994baa28 EG_vswprintf(wchar_t*, unsigned long, wchar_t const*, std::__va_list) (platformLayer-unix.cpp:224)
1 libcocos2dcpp.so 0x7f994bae3c EG_swprintf(wchar_t*, unsigned long, wchar_t const*, ...) (platformLayer-unix.cpp:305)
2 libcocos2dcpp.so 0x7f994b682c ExitGames::Common::JString::operator=(unsigned char) (JString.cpp:432)
3 libcocos2dcpp.so 0x7f99470088 ExitGames::Photon::OperationResponse::toString(bool, bool, bool) const (JString.h:152)
4 libcocos2dcpp.so 0x7f99465820 ExitGames::LoadBalancing::Client::onOperationResponse(ExitGames::Photon::OperationResponse const&) (Client.cpp:625)
5 libcocos2dcpp.so 0x7f9947770c ExitGames::Photon::Internal::PeerBase::deserializeOperationResponse(unsigned char*, bool, int, unsigned char) (PeerBase.cpp:518)
6 libcocos2dcpp.so 0x7f994761a8 ExitGames::Photon::Internal::PeerBase::deserializeOperation(unsigned char*, int) (PeerBase.cpp:464)
7 libcocos2dcpp.so 0x7f99472138 ExitGames::Photon::Internal::EnetPeer::dispatchIncomingCommands() (EnetPeer.cpp:477)
8 libcocos2dcpp.so 0x7f99475c14 ExitGames::Photon::Internal::PeerBase::service(bool) (PeerBase.cpp:191)
9 libcocos2dcpp.so 0x7f99470538 ExitGames::Photon::PhotonPeer::service(bool) (PhotonPeer.cpp:113)
10 libcocos2dcpp.so 0x7f99353e10 PhotonConnector::tick(float) (PhotonConnector.cpp:106)
11 libcocos2dcpp.so 0x7f99564184 cocos2d::TimerTargetSelector::trigger() (CCScheduler.cpp:170)
12 libcocos2dcpp.so 0x7f99564584 cocos2d::Timer::update(float) (CCScheduler.cpp:109)
13 libcocos2dcpp.so 0x7f995675d4 cocos2d::Scheduler::update(float) (CCScheduler.cpp:886)
14 libcocos2dcpp.so 0x7f99558a78 cocos2d::Director::drawScene() (CCDirector.cpp:269)
15 libcocos2dcpp.so 0x7f99558ac4 cocos2d::DisplayLinkDirector::mainLoop() (CCDirector.cpp:1342)
16 libcocos2dcpp.so 0x7f9945bef0 Java_org_cocos2dx_lib_Cocos2dxRenderer_nativeRender (Java_org_cocos2dx_lib_Cocos2dxRenderer.cpp:17)
------

more details:
http://crashes.to/s/f4e196717bc

Could you please give me some direction?
Thanks,

Jo

Comments

  • Hi JoH.

    From the callstack I can see that the Client receives an operationResponse from the server and then crashes on trying to log information about that response. Unfortunately that is about all that I can say about that crash with the provided information. It looks like it gets a segmentation fault on trying to process the format string when that string contains an unsigned char, which is the type of the operationCode. This does not make to much sense as that would mean it would crash on each and every operationResponse, because all of them contain an operationCode.

    I am afraid that we need you to provide us with a self-contained minimal reproduction case for this crash so that we can debug it.
  • Hi, Kaiserludi

    Sorry I cannot reproduce by myself right now. I just received the reports from the crash report service.
    However, It kind of caused the crash rate getting higher, and I found that all devices are Android 5.

    Android NDK does not good at dealing with some emojis, http://stackoverflow.com/questions/12127817/android-ics-4-0-ndk-newstringutf-is-crashing-down-the-app
    although I think the SDK is pure C++, just in case FYI.

    Any possible misusing the Load balancing API might cause the crash?

    Anything needed to log for debugging this? I may try to add it to the next version...
    And any advisement for this is helpful.

    Thanks a lot.
    Jo
  • Hi @JoH.

    Well, the crash is happening when you receive the servers response for a certain operation. It might be helpful to know for which operation it happens.
    You could add a line that logs that op code at the very top of Client::onOperationResponse() (You need to recompile LoadBalancing-cpp after that change, of course).

    I don't think that we can solve this by wild guessing. We definitely need more info so that we can reproduce it.

    So you might just set the logging level of the Client instance to INFO by calling Client::setDebugOutputLevel(ExitGames::Common::DebugLevel::INFO) on cour Client instance right after it's construction.

    Additionally you might want to turn on more details for the operation response and events logging in the respective callbacks.

    To do this, you would explicitly pass true for all parameters to operationResponse.toString() and eventData.toString() in the call to EGLOG() in Client::onOperationResponse() and Client::onEvent().

    Use this option with care and only, when you are sure, that none of the incoming operation responses or events might become several kb in size, as the cost of toString() increases roughly about squared to the length of the output string, so while it is relatively cheap for stringifying small amounts of data, it may get way to expensive, when asking it to stringify the complete a payload of a huge operationResponse or event instance ('huge' in this context means multiple kb). You may want to only activate this for certain operation and event codes, but not for all codes.
  • Hi

    1. Do you mean I should turn on the DebugLevel::INFO in the released app? Or just for beta version ?

    2. what does the EGLOG function actually do? Print something in console or send something back to server? How can I get the logs of released app? Should I disable EGLOG by defining EG_NOLOGGING when I release the app?

    3. How about just commenting the line
    EGLOG(operationResponse.getReturnCode()?DebugLevel::ERRORS:DebugLevel::INFO, operationResponse.toString(true));
    for preventing toString() function being called?

    Thanks a lot!
    Jo
  • Kaiserludi
    Kaiserludi admin
    edited October 2015
    Hi @JoH.

    1.
    Well, that depends. On the one hand, the INFO level will probably not hurt performance too badly (be sure, to not turn on logging level ALL in release, however, as that would have a rather heavy impact), so you could turn it on in release, but on the other hand its already enough if it can be reproduced once with the additional logging turned on. Having the crashlog a dozens of times won't really be of any more help than having it once, so only turning it on in a Beta release might be enough.

    2.
    EGLOG() prepends the specified message with date, time, logging level, file, function and line of the call and passes the resulting string to Common::BaseListener::debugReturn() (LoadBalancing::Listener inherits from PhotonBaseListener, so the debugReturn() from your implementation of the LoadBalancing::Listener interface will get called for EGLOG calls inside LoadBalancing::Client and its underlying implementation classes). Its app to your app to where you want to output those log liens that get passed to debugReturn(). Your implementation could just pass them to stdout or stderr, to logcat (in android case) or to a textfile or print it on screen with in game graphics (something like a game-UI errormessage-box) or send it to a server - whatever you want.
    For the logs to be helpful to the developer, he of course needs to get access to them, so as you don't have access to the devices of most users, I propose the following:
    Send the logs to a file, then when you send the crash logs to http://crashes.to, just also send the corresponding log files - to which you have written the Photon logs - to some server, maybe even also to crashes.to (I have no idea if they support sending and displaying of additional files).
    "Should I disable EGLOG by defining EG_NOLOGGING when I release the app?"
    You can do that, but it is not recommend. It usually makes more sense to keep logging active and just set the log level to a value at which you can't recognize a performance impact. If you set it to level ERRORS, there will be no logging, except if actually an error happens. It practically always makes sense to log errors even in release. Even WARNING should happen rare enough during normal usage for a well-tested app, that I would recommend to log them in release mode. With INFO it depends. In this case all operations get logged, even if nothing goes wrong, so this may give you a small performance penalty. Still the additional info for bughunting might outweigh this. You could test with INFO logging and if performance is still absolutely fine for your game then you may want to turn it on even in release mode. Level ALL however should only be used when debugging and when INFO does not give enough information. I would not even activate ALL for normal development, but usually just use INFO there, as with ALL it gets quite noisy (every service() call results in several log lines).

    3.
    As a quick fix this might do (assumed that line is really causing the error and not just triggering an error that is caused by some other code and would without this line just get triggered elsewhere), but we should definitely find out the real problem and fix that, so that you can keep this line in the code for releases as it might proof quite useful in the future when you try to tack down other bugs.
  • JoH
    JoH
    edited October 2015
    Hi

    Here are some logs, though a little bit few.....

    You can get more info:
    http://crashes.to/s/f4e196717bc
    almost all devices crash when connecting.
    And all of them are android 5.0 above.

    -------
    0 | 04:47:24:909 | D/CrashlyticsCore connectToServer while status Uninitialized,dt = 0.000000
    1 | 04:47:24:909 | D/CrashlyticsCore 2015-10-26 12:47:21,198621 INFO PeerBase.cpp connect() line: 146 - address: ns.exitgamescloud.com:5058
    2 | 04:47:25:411 | D/CrashlyticsCore 2015-10-26 12:47:21,700009 INFO Client.cpp onStatusChanged() line: 930 - connected to nameserver
    ---

    Thanks a lot!
    Jo
  • Hi @JoH

    I am afraid that that log and callstack don't provide any new information compared to what we already knew after the previous callstack.

    However I have got my hands on an Android 5 device today and tested demo_loadBalancing on it with rather interesting results:
    - the 4.0.4.1 SDK works fine from Android 2.2. up to Android 4.4
    - the 4.0.4.1 SDK crashes on Android 5.0
    - the latest state of development works fine on Android 5.0

    The callstack of the crash differs from yours, but it might still be related (while your app crashes inside logging, the demo crashes when it attempts to print a string on the device screen through JNI).

    This is rather interesting because we did not make any changes since 4.0.4.1 that should cause any difference in this matter.

    Still it might be the best shot that we currently have if you could update to https://dl.dropboxusercontent.com/u/4296291/Photon-AndroidNDK-Sdk_v4-0-5-0-Prerelease3.zip nad check, if the crash still appears with that version. Please don't forget to update your app version so that we can tell from the crashlog if it is for the new or the old version.

    If updating to those new libs does not help, then I am afraid that we definitely need a repro case to track down this issue.
  • JoH
    JoH
    edited October 2015
    Hi @Kaiserludi

    1. I did update the SDK, but unfortunately the same crash still happens.
    If the logs don't help, any other advisements?

    2. I found that the crash mostly occurs on devices with arm64-v8a CPU.
    (Samsung S6, LG G4....etc)
    Did you try SDK on this kind of devices?

    3. I would like to confirm again.
    Do you use the android NDK function "NewStringUTF" or "GetStringUTFChars" in your SDK?
    http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-June/010403.html
    I met the relative string encode-decode crashes.
    I avoided to use them (recently fixed by cocos2d-x team) and then back to normal.
    FYI.

    Thanks a lot!
    Jo
  • Hi @JoH.

    1. Any chance that you could get your hands on an Android 5.0 device to reproduce the crash?
    2. No - we don't currently have such a device available for testing.
    3. We only use the first one of those two functions and even that on exclusively inside demo-code, but not inside library code at all.
  • JoH
    JoH
    edited October 2015
    Hi @Kaiserludi

    We don't currently have that kind of devices, neither. Any plan there to test it?
    And one more thing to confirm, what NDK version do you use? We are using r10e by the way.

    Thanks for helping
    Jo
  • Hi @JoH.

    No, there are not any concrete plans, yet.

    Yes, we are also on r10e.
  • Hi @Kaiserludi

    We received some user feedback that "Samsung S6 Edge" keeps crashing when connecting to Photon Cloud.
    And according to the crash report we got, the crash occurs almost on arm64-v8a arch..
    So could you do some tests to confirm the Android NDK SDK works on arm64-v8a arch.?

    Thanks a lot.

    Jo
  • Hi @JoH.

    We have just bought a brand-new Samsung Galaxy S6 development device today, but our demos work fine even on that 64bit Android 5 device (at least with the prerelease SDK that I gave you).

    Should this bug appear on every connect or only rarely?
    If it should happen regularly than I am afraid that we can't reproduce it with the stock version of the demo - it must be something special that your game code or your users are doing.
    You have mentioned that "Android NDK does not good at dealing with some emojis", so maybe it is related to some special characters in the user names.

    Could you log the usernames and see if there is anything special to the usernames of those users who crash on connect?

    As it might not happen at all with the Photon demos, but might be related to your app code and as you can't provide a reproduction case as a last resort we might need you to provide access to a build-able and run-able version of your app project including its source code, so that we can debug the issue with a project from which we know for sure that the crash happens with it. Is that possible?
  • Hi @Kaiserludi

    First, really thanks for testing for us.

    I checked again the Loadbalancing demo from the SDK you provided before, and I found in Application.mk file:
    APP_ABI := armeabi armeabi-v7a x86
    which seems the android demo does not really be built for arm64-v8a arch.
    I guess the S6 might run the app in some kind of compatible mode.

    My environment is cocos2d-x v3 and APP_ABI := armeabi armeabi-v7a x86 arm64-v8a,
    Maybe it's the difference.
    I couldn't get the app run correctly in arm64-v8a CPU without building app for arm64-v8a. So I had to add arm64-v8a arch to my Application.mk file. If you know how to avoid it please give me a hint.

    I'm not an expert of android NDK actually, so please correct me if I got some misunderstood.

    For reproduction case, we will try to make it happen but might not be so soon.

    Thanks a lot
    Jo
  • JoH
    JoH
    edited November 2015
    Hi @Kaiserludi

    I borrowed a Samsung S6 and my app crash every time when I try to connect to Photon cloud.

    And I also confirmed that Samsung S6 crashes in "Loadbalancing demo" if I added arm64-v8a arch to Application.mk. the detail is following:

    1. change to my application-id
    2. build, run normally. (no arm64-v8a)
    3. modify Application.mk file to APP_ABI := armeabi armeabi-v7a x86 arm64-v8a
    4. build, run crash immediately. and the crash stack is the same as I provided to you.

    Please try.

    Thanks a lot!
    Jo
  • Hi @JoH.

    which seems the android demo does not really be built for arm64-v8a arch.
    Good catch!

    I think I can reproduce it with the demo now, but unfortunately I could not get VisualGDB, which we normally use for debugging C++ code on Android, do work with 64bit binaries.

    What method do you use to access the stack trace?

    I guess you do not know a method to step through Android C++ code, that works with 64bit binaries?
  • Hi @Kaiserludi

    I used this, but I don't know how to step the code on Android. Sorry about that.
    http://stackoverflow.com/questions/18436383/how-to-check-crash-log-using-android-ndk-in-cocos2d-x
  • Hi @JoH.

    I have found and fixed the bug that was causing this crash :-)

    Please update to https://dl.dropboxusercontent.com/u/4296291/Photon-AndroidNDK-Sdk_v4-0-5-0-Prerelease4.zip
  • Hi @Kaiserludi

    I tried it and it works now.
    Could you share some info about the crash?
    I might also need to care about the same thing in arm64 arch..

    Thanks!
    Jo
  • Hi @JoH.

    I don't think that you have to care about this in your code, as it was very specific to the Photon Client code.

    Photon uses widestrings (wchar_t*) as internal representation of its string class JString. Older Anroid versions do not have an implementation for swprintf(), so we needed to write our own implementation of it for Android: EG_swprintf().

    toString() generates a string representation of the instance on which it has been called. To achieve this it uses JStrings operator+() overloads, which internally call SWPRINTF(), which on Android resolves to EG_swprintf().

    For example OperationResponse().toString() among other things adds the operation code to the returned string by calling the JString::operator+() overload for parameter type unsigned char. This overload then specifies "%hhu" as format specifier for the operationCode.

    EG_swprintf() calls EG_vswpfrintf(), which reads out the passed in format string to find out how many optional parameters it got called with and how it has to interpret those optional parameters (basically what the implementations of all the C standard library printf() variants do). This gets done with the help of pointer arithmetic on the format string.

    That pointer arithmetic code had a bug that made it use an unsigned int for the offset in bytes between the locations of two different pointers in that format string. Now when the format string specifies "%hhu" for unsigned char, then the code at one points needs to access the character at ptr[1-offset], with offset having the value of 2. This should obviously resolve to ptr[-1], which may look strange, but is actually well defined, as ptr at that line points to str+2, so that ptr[-1] is the same as str[1]. Now as offset was unsigned, 1-2 did not actually result in -1, but in UINT_MAX. Now ptr[UINT_MAX] resolves to *(ptr+UINT_MAX) and on 32bit platforms where a pointer can at max hold a 32bit unsigned value for the memory address ptr+UINT_MAX overflows and the result of that overflowed addition is exactly the same address as the one that we would have got for ptr[-1], which resolves to *(ptr-1).

    Therefor we got exactly the intended behavior for 32bit despite the bug.
    So as this code gets only executed on Android and as Android did not support 64bit before version 5.0, this bug was unnoticed until now.

    With 64bit a pointer can actually hold the address at ptr+UINT_MAX without overflowing, so on 64bit that code tries to access memory that is 4GB away from the intended location, which unsurprisingly results in an access violation crash.
  • Hi @Kaiserludi

    I see. You did the overflow trick to gain the performance, but overflow doesn't happen in 64 bit world.

    Thanks for the great help
    Jo
  • Hi @JoH.
    JoH said:


    You did the overflow trick to gain the performance

    I don't think so. I am sure this overflow was unintended.
  • Ok. I misunderstood this part. :P
  • jdaniels
    jdaniels
    edited November 2019
    Sorry to resurrect an old thread, but we've just run into this issue (we are using 4.0.3). However, the dropbox link with the fixed version is no longer valid. Would it be possible to re-upload it?

    (We'd rather use that than update to the very latest version as we support cross platform play and would like to avoid updating all clients - I'm guessing this won't work if v4.1.15 is used on on some devices and v4.0.3 on others?)
  • Hi @jdaniels.

    I have just made the official release of 4.0.5.0 available at https://www.dropbox.com/s/agfn7sllim4da3d/Photon-AndroidNDK-Sdk_v4-0-5-0.zip?dl=1 for you, which is the oldest release that contains the fix that is mentioned in this thread.
    The package that I have linked in my 2015 post has been a prerelease package and we don't archive pre-release packages, as they are only intended to be used until the next official release is available.


    (We'd rather use that than update to the very latest version as we support cross platform play and would like to avoid updating all clients - I'm guessing this won't work if v4.1.15 is used on on some devices and v4.0.3 on others?)

    Actually as long as you don't use any features that are exclusive to the later version, those clients are cross-platform compatible (in general as long as the first two digits of the version numbers match, the clients are guaranteed to be compatible to each other and in case of Photon server also to the server with that version, but even if those numbers do not match, most of the time the clients are still compatible to each other, it is just not guaranteed to always be the case).
    For cross platform play on Photon Cloud the appID, appVersion and cloud region must match each other, but the Client SDK version does not need to match.

    As you are talking about cross-platform play and do support Android, I assume that you also support iOS.
    In that case please note that Photon Client versions prior to 4.1.0.0 do not support IPv6 and that Apple requires IPv6 support for several years by now for any iOS appStore updates, so you likely will need to update the iOS version of your app to a more recent Photon version the next time that you need to update anything in it.
  • Thanks for the link, and also for the compatibility info.