Wierd client disconnect issues in Steam (PUN) clients

Options
boudinov
edited August 2014 in Photon Server
We are having connection problems with a good amount of players, after we release our Unity3d game to Steam. We are authenticating users on a Login Photon server, then connection gets closed and a new one is established to Photon Lobby server. A good percentage of people cannot get data from lobby, they just get disconnected after like 30 seconds, with their incomming queue showing 5 to 10 commands queued.

Photon servers are hosted on Amazon USA.

I gurantee I never seize to call DispatchIncommingCommands and SendOutgoingCommands. But at some point commands count in incomming queue increase and after around 30 seconds or more, I get DisconnectByServer. Some bzipped data should be incomming from server, as command responses with biggest size of 8kb.

This happens approximately to 5% of the people, and them only. They have been asked to turn off possible antivirus or firewall programs. But problems persist.

There are couple types of warnings in Photon-Instance logs but these burst for some periods, not logged all the time, look like this:
1. 2456: 16:57:47.404 - 0000000012FA9260: Peer: 15926 challenge mismatch: 2710347067 and 841888560
2. CENetThreadPool::Process - Exception - CENetHost::HandleIncomingCommands() - Unexpected: Received less data than protocol header size from: 179.211.1.102:47651 on: 10.2.0.233:5059

PhotonServer.config SETTINGS:

EnablePerformanceCounters = "true"
DataSendingDelayMilliseconds="5"
AckSendingDelayMilliseconds="5"
PerPeerMaxReliableDataInTransit="32768"
PerPeerTransmitRateLimitKBSec="128"
MaxQueuedDataPerPeer="98304"
MinimumTimeout="20000"
MaximumTimeout="22000">

<ThreadPool
InitialThreads="4"
MinThreads="4"
MaxThreads="4">
</ThreadPool>

<!-- Using the same value for initial, min and max makes the pool fixed size, which allows optimizations. -->
<ENetThreadPool
InitialThreads="2"
MinThreads="2"
MaxThreads="2">
</ENetThreadPool>


Out Photon versions:

== Core ==
PhotonSocketServer.2010
BuildDate: 2012-02-03 13:14:31
Version: 3.0.11.1074
SVN repository: photon-socketserver
SVN revisions of...
... this project: 1074
... bin-tools: 84
== SDK ==
BuildDate: 2012-02-06 12:50:35
Version: 3.0.19.2868
SVN repository: photon-socketserver-sdk
SVN revisions of...
... this project: 2868
... bin-tools: 2868



Help, please

Comments

  • Thanks for the detailed description.

    As mentioned before, here is a quick guide for disconnect analyzation: http://doc.exitgames.com/en/realtime/cu ... isconnects

    That being said - your settings look a bit restrictive, I'll review later today and update you with some advise.

    You are also using a Photon version that is quite old. Is it possible for you to update to the latest release v. 3.4?
  • Hello and thanks for the answer,

    Should we be alarmed by the increasing length of incoming command queue before the disconnect? Even if network or protocol problems arise, the queue length should be kept to zero while DispatchIncommingCommands is regularly invoked, right?

    We set PhotonNetwork.isMessageQueueRunning when necessary, and run service calls without a hiccup.
    We have around 1000 players online. People with this problem have it all the time, even when players are at a minimum 500 online.
    Resource monitor show low utilization of traffic and cpu on the photon server.
    It will be a hard task to upgrade out Photon version at the moment, but might consider it if nothing helps.

    We should try collecting some info with PhotonPeer.TrafficStatsEnabled, but i guess a Wireshark monitor will better reveal the problem. Will be tricky though, to run one on a wannabe our player on Steam community.
    Do you know by any chance if Enet/Photon Wireshark dissector is available?

    Could it be that an older version of Enet just does not get well along with some network adapters(and their managers) or routers?

    Thanks
  • Just found out, MaximumTransferUnit was set to 2400 on our clients (1200 is default), which turns out, sets such MTU for this client connection on server too?
    Could this lead to failures with Path MTU Discovery or something like that?
  • That indeed was the problem.
  • Hey,

    sorry for the late response here and thanks for sharing your findings.

    For your questions:
    Should we be alarmed by the increasing length of incoming command queue before the disconnect? Even if network or protocol problems arise, the queue length should be kept to zero while DispatchIncommingCommands is regularly invoked, right?

    If it is constantly increasing, it means that the client can not "catch up" with processing the incoming commands, so there might be something wrong. This might be related to network issues, too: if you use reliable UDP, the client needs to acknowledge the received packets. If the ACKs get lost and don't reach the server, the server re-sends the packets, so you might get more incoming data than you expected. (The MTU settings might be an explanation for it).

    We have a Wireshark dissector for Photon /ENet for internal use; if you assume that something is (still) wrong on the network level, please get in touch with us - we are responsible for the network layer, so we help to analyze and fix these issues, of course. We don't expect our customers to do our work for us ;-)

    Glad that you found it out, anyways - we'll have a look if / how we can make some improvements in that area to prevent issues like these better in future. Thanks again for your updates!