Post Mortem: Photon Cloud EU Outage on 02/22

edited February 2013 in Announcements
There has been an outage of Photon Cloud EU this morning, 02/22, beginning around 4 AM UTC. There service has been fully restored by 7:30 AM UTC.

Here is what happened:

Incident description:
In an effort to extend the capacities of Photon Cloud (and fix some occasional performance issues along the way), we've been switching to a larger server for the Cloud master server on the evening of 02/21. The server had been tested previously and the switch was completed successfully.

When user traffic began to increase, the network performance suddenly began to degrade, causing client disconnects since ~ 4 AM. By 4:30 AM, all client connections to Photon Cloud were failing. During our internal investigation, we found out that it seemed to be a network level problem - we saw immense amounts of package loss - and escalated to our hosting provider. We hoped to be able to resolve this issue quickly, but it turned out that we could not identify the root cause in time.

Resolution:
We finally decided to switch back to the previous server, which resolved the issue quickly. By 7:30 AM, full functionality had been restored.

Next actions:

- Working on failover and recovery mechanisms for Photon Cloud will be our top priority in the next weeks. Although Photon Cloud is already running quite stable, we still need to make improvements to recover from unforeseen events better and faster. We plan to build a solution with better failover capacities, load spreading for the master server, and the ability to switch between servers without any downtimes and especially without client impacts.
- An in-depth analysis of the network issue, together with our hosting provider, will of course take place as well
- Afterwards, we'll also review our current network layout, adjust network structures where neccessary etc.

I hope I could shed some light on todays incident. We apologize for the issues and for the impact this had on your games.

Let us know if you have any questions.
This discussion has been closed.