QUEUE_OUTGOING_ACKS_WARNING

ufnv · December 2019

Hi!

I am switching from the Steam networking to Photon and have encountered a strange problem.

The game networking logic works perfectly when using Steam.

The simplest scenario - two players connected to one "Room". One is "Server", another is "Client".

"Server" sends to the "Client" a number of small "update" events, up to about 50 events per second.

About 20 seconds everything goes fine, then the "Client" starts receiving the events with the increasing lags, up to the couple of seconds. At some point of time it receives the QUEUE_OUTGOING_ACKS_WARNING, then QUEUE_INCOMING_RELIABLE_WARNING, then disconnects.

The manual suggests to increase the number of the "service" calls, but there are already a lot of these calls.

The statistics at the moment when QUEUE_OUTGOING_ACKS_WARNING is received:

Events sent from "Client" to "Server" = 71

Events sent from "Server" to "Client" and handled by the client = 1265

"Service" call on the "Client" = 4631 (each frame at least)

Any suggestions on what could be the cause?

ufnv · December 2019

As a general observation - quite the lag is quite big - more than one second turn-around easily!

Is it a normal behaviour for Photon Realtime?

That's not my network - no apparent lag when using Steam networking. What amount of data Photon should be able to handle? I am sending about 0.1-4 Kb per second - could this amount of data cause these lags?

Kaiserludi · December 2019

Hi @ufnv.

That warning gets triggered when for more than 100 incoming reliable messages the acknowledgements for having received them have not been sent out yet.

When a client or the server sends a message as 'reliable', it will repeat it when it does not receive an ack for it in a certain amount of time (the exact amount depends on the latency, on the clients resents options and on the question, for how many times that message already has been repeated).

If still no ack arrives after several repeats or a certain amount of time, then the sender considers the receivers connection as lost.

Hence it is important to send out acks in a timely manner.

The client sends out acks automatically, so you don't have to deal with this directly yourself. However it does so inside of service(), so if you get this warning, this often indicates that service() has not been called for quite a while.

Another scenario in which this warning can happen is when a client receives a really big messages.

Messages that don't fit into one UDP packet are split into multiple fragments and those fragments are always sent reliably, even when the sender specified that they should be sent as 'unrelliable' (because if only one of them gets lost, all other fragments for that message would become useless).

A lot of these fragments might arrive at about the same time between two service() calls of the receiving client. So if the message is big enough to result in 100 or more fragments (which happens at a message size of roughly about 100-120kb), then this single message can trigger the QUEUE_OUTGOING_ACKS_WARNING, as 100 or more fragments for it might arrive between two service() calls and the receiving client needs to send out an ack for each of them. In that situation you can safely ignore the warning.

1265 messages in 20 seconds are more than 60 messages per second on average, so definitely above 50 and not just up to about 50.

You might want to log a timestamp for each service call and one when you receive that warning and then have look at the timestamps to see when the last service() call has been made. maybe something else is blocking the thread for several seconds?

"As a general observation - quite the lag is quite big - more than one second turn-around easily!

Is it a normal behavior for Photon Realtime?"

Definitely not.

Please check which Photon Cloud region you are connecting to. Maybe it's one far away from the location of your client. Note that one default the client will connect to region EU. So if your client is located on a different continent, this will impact the latency quite a bit, especially if that other continent is not North America.

In that case using RegionSelectionMode::BEST and one being connected storing the return value of getRegionWithBestPing() in a file and using it for future connects (to avoid having to ping all regions again on every connect, as it can take multiple seconds) will certainly improve your ping.

Also please call getRoundTripTime() after you have been connected for a few seconds and have sent out a few reliable messages (so that the client has gathered enough information for an accurate value) to see what your actual ping is. If the return value of that function is way lower than what your own code measures for your messages, then there is some problem inside your code that causes messages to stay inside some queue for too long either on the sending or on the receiving side.

"What amount of data Photon should be able to handle? I am sending about 0.1-4 Kb per second - could this amount of data cause these lags?"

No, this amount should not be a problem at all.

However there is no way that you can get as low as 100 bytes per second when you send more than 60 messages per second. Keep in mind that UDP and Photon itself both need to add some overhead in addition to your raw payload data and even if that 100 byte of raw payload data for 60 messages would mean on average less than 2 bytes of payload per message.

You can see the exact amount of data that Photon has sent and received since entering a room by call getBytesOut() and getBytesIn(). Just divide those values through the amount of seconds that have passed since you have entered the room to get an accurate number of how many bytes on average you sent and received per second.

ufnv · December 2019

Thanks!

I am in Italy, so the "eu" region.

I did some further testing, using the Client.vitalStatsToString() trace, manually calculating the number of messages and also getBytesOut() and getBytesIn().

It turned out to be the combination of two issues.

First, one of my test machines (the one running the "server") was from time to time getting awful Rtt (>1000), while others have a stable Rtt of ~60. That was leading to the messages on a "client" machine not getting Acks for a prolonged period of time.

Second issue seems to be the peaks of sending messages. At some moments it actually exceeded 500, while usually keeping around 1-4 per second, if I calculate them correctly.

Does invoking opRaiseEvent with the "Lite::ReceiverGroup::OTHERS" ReceiverGroup count as N-1 messages or as 1 message (I suspect former)?

If the message is too long for UDP and is to be split in two, does it account for one message or two messages?

So the main differences that make Steam networking much more stable seems to be the restriction to the number of messages plus some strange issues with Rtt that are ignored by Steam (as it keeps the connection much longer), but lead to losing the connection with Photon.

Kaiserludi · December 2019

Hi @ufnv.

When you say that you exceed 500 messages per second, do you mean overall in the room or on one client. If your master client (the one that you have called the "server") is processing over 500 messages per second, that may simply be too much for it to handle.

The 500 message per room "limit" is not enforced, so going above it will NOT cause any direct effects. It's more of a recommendation, as if you send around i.e. 2000 messages per second you are much more likely to cause more traffic than what is included in the price than with 500 messages per second and also it is much more likely that single clients may just end up having to process more messages per second than their hardware is able to handle.

"Does invoking opRaiseEvent with the "Lite::ReceiverGroup::OTHERS" ReceiverGroup count as N-1 messages or as 1 message (I suspect former)?"

Actually it counts as N messages with N being equal to the number of clients inside the room. In that scenario you have N-1 receivers for the event + 1 sender of the raiseEvent operation request, so it's 1 message from client to server + N-1 messages from server to client, resulting in N messages for the sum of messages in both directions.

"If the message is too long for UDP and is to be split in two, does it account for one message or two messages?"

In that case it still counts as just one messages. The fragments do not count against the message limit individually.

So a "restriction to the number of messages" definitely can't cause the observed issues as we don't enforce such a restriction.

ufnv · December 2019

That's very helpful, thanks!

So, packing a number of small messages into one bigger is considered more "healthy" behaviour, even if the resulting packet some times growth big enough to require splitting?

So not only 1 message of 1KB is better than 100 messages of 10 bytes each, but even 1 message of 10KB is better than 100 messages of 100 bytes each?

Kaiserludi · December 2019

Hi @ufnv.

It depends.

There is a small per message overhead as well as a per packet-overhead, so one 1000 byte message is better than 100 10bytes messages. However obviously if a few frames more or less latency are critical for a message, then it obviously makes no sense to artificially hold it back just to aggregate more data to send.

The Photon Client automatically puts multiple messages into one UDP packet, if they fit, but it won't add artificial waiting time to gather more messages. If there are multiple messages in the queues waiting to go out that fit into one message when you call service(), then service() will aggregate them, but if thy are not, then it will just send out what's there and not postpone anything to the next service() call in the hopes that you will queue more messages until then. It makes sense to call service() not after each message, but at seem frequency like i.e. 10, 20 or even 60 times a second to give it a chance to aggregate small messages if your code queues multiple small messages for sending within this timespan.

So you can rest assured that when you call opRaiseEvent() 10 times with tiny payloads and then call service(), this will just generate one UDP packet and not 10 of them.
Just one message would still be a little bit better, if you use the exact same options anyway, as that would mean having to send the per-message overload like the eventCode, the receiver list, etc., only once instead of 10 times and that difference may decide if the overall data suit into 1 packet or if 2 packets have to be sent out.

If a single message does not fit into one packet and Photon has to split it up into multiple fragments, then all those fragments are sent reliable.

If a message of a certain type is sent very frequently and the data in newer messages of that type replaces the one from older messages anyway, then its better to sent it unreliably, as repeats of old messages are pointless when newer data would arrive before the repeated data anyway and the receiving client would have to hold the newer data back just to assure that you get your messages in the correct order in case that their order is important.

Hence for this kind of data it makes a lot of sense to keep the messages small enough that they don't need to be fragmented. The max packet size is 1200 bytes (as that is a safe bet for all parts of the internet infrastructure that a message will pass to be able to handle) including overhead. This can equal up to about 1150 bytes of payload data depending on the type of your data (simple data needs less structuring overhead than deeply structured data). As a rule of thumb a payload of 1000 bytes or less it pretty much guaranteed to fit into one packet, no matter how complex your data is structured.

Keep in mind that photon is not written for use a file server, but for many small messages sent very frequently, so while you can send 10kb or 50kb messages, doing so should be rare and most of your messages should be small enough to fit into one packet.

QUEUE_OUTGOING_ACKS_WARNING

Comments