Photon Server - Thread was being aborted.

Options
MRBMRB
edited January 2014 in DotNet
I have online multiplayer board game, it is running more than year but last period it started unexpected reloads randomly.
when I saw photon log I saw this:
6160: 18:02:44.040 - CManagedHost::OnDefaultAction() - OPR_ThreadAbort - eAbortThread
6160: 18:02:44.041 - CManagedHost::OnTimeout() - OPR_FinalizerRun - eUnloadAppDomain
6160: 18:02:44.041 - CManagedHost::OnDefaultAction() - OPR_AppDomainUnload - eUnloadAppDomain
9128: 18:02:44.049 - Unloading Domain: ApplicationName = 'Domino2', DomainId='50'
6160: 18:02:44.093 - CManagedHost::OnDomainUnload() - 50
6160: 18:02:44.093 - Releasing reference on default app domain
6160: 18:02:44.093 - Restarting application: "Domino2" due to unexpected unload
6160: 18:02:44.093 - Application: "Domino2" restart request (abort existing connections)
6160: 18:02:44.162 - Taking reference on default app domain
6160: 18:02:44.203 - Application: "Domino2" started in app domain: 51
6160: 18:02:44.224 - StopApplication - Exception: CManagedHost::GetManagedHost() - AppDomain Manager not available for id: 50
6160: 18:02:44.238 - Application: "Domino2" stopped in app domain: 50 (abort existing connections)
6160: 18:02:44.238 - Application: "Domino2" restart complete.
6252: 18:27:53.256 - CTCPPolicySocketServer: connection timeout

In my custom logs I see that thread abort exception is thrown in OnOperationRequest event. I have tried to call Thread.ResetAbort() too but it does not helps;

what can cause this exception? and why it causes restart of whole application;

please help

Comments

  • chvetsov
    Options
    Well, it looks like some Photon bug.
    I'm sorry, but all peaple on vacation now, and will take a look at a problem after new year

    Please check your log folder, may be you have dump file from Photon. Please upload them to some where and share link with us. We will check it
  • thanks for reply

    Is it possible that thread abort exception is not reason and it is caused because of photon trying unload application. because as i know if you want to unload appdomain you have to call thread.abort();
  • In Photon, each "application" is an own AppDomain. If you want to unload an application, you can call AppDomain.Unload. This causes all threads in the application / AppDomain to be aborted.

    On the other hand, it is possible that an AppDomain is unloaded by an unhandled exception... that behavior can be configured in PhotonServer.config, as described here: http://doc.exitgames.com/en/photon-serv ... onHandling

    So the ThreadAbortException can either be the reason for or a consequence of the unloaded app domain.

    For further debugging, please check the following:

    1. Which exception handling policy is set for your app domains? Please post the <Runtime> element of your PhotonServer.config, or, preferably, send me the whole PhotonServer.config file).

    2. please provide the following log files:
    - Photon-[InstanceName]-... (the complete log file from the day when Photon was last started, and the last occurrence of the restart)
    - PhotonCLR.log
    - /deploy/log/Domino2.log (or whatever the log file name from the affected application is)

    Please attach the log files to this thread or send me a download link per PN so that I can have a look.

    You might also want to check out DebugDiag: it is a great tool to get more information in case of unexpected exceptions in production environments: http://blogs.msdn.com/b/debugdiag/archi ... w-rtw.aspx
  • I checked config file and UnhandledExceptionPolicy is set to ignore. so why is it restarting?

    I have attached my config file here. Domino1 and Domino2 are my applications
  • can anyone help please?
  • Thanks - we'll need to have a look if / why the UnhandledExceptionPolicy does not handle the ThreadAbortException in background threads correctly (and if we can do anything about it at all).

    Did you have any success to debug the actual exception?
  • Philip
    Options
    First not all thread aborts can be caught by the UnhandledExceptionHandler.
    The cases that can't be ought are usually due to the CLR triggering them
    in extreme cases (see the two sample cases described below).
    This cases are also triggering the restart of the appdomain.

    Note: From the logs you sen't I'd say a finalizer in your code is taking to long (case #2).

    #1 a finally that gets stuck
    [code2=csharp]try
    {
    Thread.CurrentThead.Abort(); // in my test another thread aborted this one
    }
    finally
    {
    while (true)
    {
    log.Debug("The timer callback finally.");
    Thread.Sleep(1000);
    }
    }[/code2]

    This is what you get in Photon-Default.XXXX.log:
    6512: 13:38:56.228 - CManagedHost::OnDefaultAction() - OPR_ThreadAbort - eAbortThread
    6512: 13:39:11.240 - CManagedHost::OnTimeout() - OPR_ThreadAbort - eRudeUnloadAppDomain

    #2 a finalizer taking to long:
    [code2=csharp]~ExampleClass()
    {
    while (true)
    {
    log.Debug("~ExampleClass()");
    Thread.Sleep(1000);
    }
    }[/code2]

    This is what you get in application.log:
    2012-08-02 16:59:36,997 [2] DEBUG HelloWorld3.Server.ExampleClass [(null)] - ~ExampleClass()
    2012-08-02 16:59:37,998 [2] DEBUG HelloWorld3.Server.ExampleClass [(null)] - ~ExampleClass()
    ...
    2012-08-02 16:59:50,010 [2] DEBUG HelloWorld3.Server.ExampleClass [(null)] - ~ExampleClass()
    2012-08-02 16:59:51,014 [2] DEBUG HelloWorld3.Server.ExampleClass [(null)] - ~ExampleClass()

    This is what you get in PhotonClr.log:
    2012-08-02 16:59:52,012 [ 2] ERROR PhotonHostRuntime.PhotonDomainManager - UnhandledException:
    System.Threading.ThreadAbortException: Thread was being aborted.

    This is what you get in Photon-Default.XXXX.log:
    13112: 16:59:52.005 - CManagedHost::OnDefaultAction() - OPR_ThreadAbort - eAbortThread
    13112: 16:59:52.005 - CManagedHost::OnTimeout() - OPR_FinalizerRun - eUnloadAppDomain
    13112: 16:59:52.005 - CManagedHost::OnDefaultAction() - OPR_AppDomainUnload - eUnloadAppDomain
    12312: 16:59:52.019 - Unloading Domain: ApplicationName = 'Lite', DomainId='2'
    ---

    To find out what is causing this might be tricky. Unless you can reproduce it locally while debugging.
    You might have to resort to a tool to gather a stacktrace or dump on production.
    I'd recommend Debug Diagnostic Tool as described:
    https://www.dropbox.com/s/2ux686brdkm6e ... verflow.md

    Only difference for you would be to select a different trigger ( ThreadAbort Exception instead of StackOverflow).

    Hope that helps.

    -Philip