Continued Photon Disconnection Issues (Past Week)

Seems like the mass disconnections by Photon have returned (lasted about 30 hours last week).

Any word on stability improvements? It never used to be this unstable~ I've mostly had good experiences, minus a couple days worth. It's strange to have an entire 2018's worth of instability over a small period of time, is why I post.

We run a match-based game which makes disconnection all-the-more frustrating.

(It's not a user connection issue and always happens at the same time for groups of people and nothing has changed for ages)

Comments

  • JohnTube
    JohnTube ✭✭✭✭✭
    edited January 2019
    Hi @xblade724,

    Sorry to hear you are having issues with our service.
    While we work towards achieving maximum availability and stability, you can't avoid the worse.
    The public cloud does not have SLAs, private enterprise one does, however.

    What you could do, is add protection for your own game which might require some work but that's how one of our other customers is handling this.

    You could add a telemetry system to collect unexpected disconnects or connection failures occurrences with all required details (region, server IP, disconnect reasons/causes, etc.) from clients.
    Of course, you need to tell your users about this anonymous stats collection or if it's not anonymous (client IP, UserId) make them opt out.
    Have some dashboard to show these and an alerting system so you can take action if needed.
    So if in a short period of time lots of disconnects happen on the region "US" you could switch your clients to "USW" via Photon's "regions whitelist" feature.
  • xblade724
    xblade724
    edited January 2019


    We do region switching when servers go down -- however, then they cannot queue in the main server that other people may be on.

    You can see the instability is in spikes: This means that someone may transfer to a different region -- but everyone else is still on the main region.

    Region changing only works when the entire server is down for a long period of time. Even then, it's spikes: the servers come back up shortly after, then the playerbase is suddenly split until they relog. This has already resulted in negative reviews for "was queued for 45 mins and didn't get a game!" when it should've actually been an average 15 seconds.

    The instability continues more in the past 10 days than all of 2017/2018 combined :( It's extremely demoralizing. On my first email, I was immediately offered enterprise (about 10~12x more than what I pay now) and pointing out that I don't have an SLA, which seemed like a low-punch as even non-enterprise has been more-or-less stable for 2017 and 2018 except for 1 giant outage, maybe a few little tiny ones that went away fast. It's only 2019 that it's been this wild and in such a short period of time that doesn't seem to go away.
  • We understand your frustration and we're sorry about it. We are not happy with the situation of course and working on a solution. As usual with networking and software, it's more complex than expected. Even if you factor that in.

    Region changing only works when the entire server is down for a long period of time. Even then, it's spikes: the servers come back up shortly after, then the playerbase is suddenly split until they relog.

    Yes, it's extremely annoying to switch regions and split the user base.
    You could possibly mitigate this with your own service, where players report their region and disconnects (via REST api). If everyone uses this after a match, this could guide your players to active regions with more flexibility.

    Also, there are statistics values coming from the Master Server, which tell a user the count of players and rooms in their region. Checking this for very low values and showing a (temporary) warning, might help setting the expectations for the user.

    We hope to have good news on this asap.
  • xblade724
    xblade724
    edited January 2019
    Thanks - I have a few workarounds in mind, but unfortunately it still doesn't help *now*, as it'll still take some time.

    Yea, networking -- it's always complex, indeed.

    I wish you luck on getting it working soon. Is there an action being done now, or is it just waiting? What is the likely ETA of the fix? Even a ballpark would do justice since it's been so long. Sadly, we cannot afford Enterprise - we only have a few hundred active at a time. We're quite small, but consistent. Earlier today it brought us down from 200 something to
  • We wouldn't say we're working on it, if we just wait. However, we're not running our very own machines, so we also depend on the hosting provider to help us. It's our top prio at the moment.

  • Gotcha, thanks for the clarification.
  • Yesterday had issues, but today seemed fine from a quick glance -- can't easily see disconnection chatter.
  • It should have improved already, yes. But we're still not happy or done.
  • xblade724
    xblade724
    edited January 2019
    Heads up, there was another outage today in "US" -- not just EU chat servers, for the same duration (~2hours)
  • Confirmed. We were attacked again in the past hours. This time the Master Servers got more fire. We're on the move.