[Fault Tolerance]Any suggestion when game server crashes

Options
garric
edited March 2012 in Photon Server
Hi,
Haven't found any doc on Fault Tolerance using Photon, is there any suggestion?
Imagine, if one of our game servers was crashed for some reason, we had to instantly recover it.
So using hot backup system? I mean, when crashes, just redirect clients to a mirror of this game server?
Or update properties of this game to a database? I mean, When crashes, just restore game server with data on database?

Or there has some existing middleware can solve this problem?

:)

Comments

  • Philip
    Options
    Hi garric,

    fault tolerance is a complex issue and depends a lot on your game, your architecture and how much you are willing to invest ... anyway I can tell you about the strategy we follow on our cloud service a highly available service. The system is based on a set of loosely coupled systems:
    a. the master or lobby service
    - game server report games and health/load-level
    - player connect to create/join games
    - the master "sends" players to game server according their load
    b. the authentication service
    - used by master to authenticate applications
    c. the set of game servers
    - clients connect to "play"

    Each of them can fail without disrupting the others and they are "self healing" when they recover.
    If the master crashes - a hardware loadbalancer fails-over to a standby master.
    The game servers "reconnect" to the master and send their list of active games.
    If a game server fails - the master takes it out of the list of games - and stops sending
    clients to it.
    If the authentication service fails - which is only called on application i'ds not found in the cache
    we assume the authentication is OK. This is a risk we decided we could take because we
    made sure the service a failed-over webservice and a failed-over database won't fail.
  • garric
    Options
    @Philip
    err, it seems more confused to me now :?
    "The master server 'send' clients to game server with min-workload", so the game server is hosted by ourself, or hosted also in cloud?

    If the game server is hosted in the cloud, what about the specific game logic, say a playcard turn-based game? In the load balancing project, the GameClientPeer wil treate all the custom operation as unknow operation and directly response to client with the message "Unkown operation code...". So this sounds that, the game server will be hosted by ourself, is that right?
    :)
  • Philip
    Options
    garric wrote:
    @Philip
    err, it seems more confused to me now :?
    "The master server 'send' clients to game server with min-workload", so the game server is hosted by ourself, or hosted also in cloud?

    I described the architecture of the cloud: http://cloud.exitgames.com.
    garric wrote:
    @Philip
    If the game server is hosted in the cloud, what about the specific game logic, say a playcard turn-based game?

    I was talking about the game server that are part of the cloud and don't support custom logic.
    garric wrote:
    @Philip
    In the load balancing project, the GameClientPeer wil treate all the custom operation as unknow operation and directly response to client with the message "Unkown operation code...".

    Correct
    garric wrote:
    @Philip
    So this sounds that, the game server will be hosted by ourself, is that right?

    No. We currently don't support a mixed model (master on the cloud and game server selfhosted).
    If you need custom game logic you have to host your own master too.

    In this thread I explain the model we recommend to integrate custom logic like persisting profiles
    viewtopic.php?f=5&t=1514

    The case of a turnbased playcard game you mention ... well the cloud is targeted rather for a design where you have this logic in one of the clients a "master-client" (for instance the creator of the room). Of course you then have to deal with "master-clients" leaving/disconnecting ...
  • Philip
    Options
    @garric: please let us keep the discussion on supported models/workflows of the cloud on your other thread viewtopic.php?f=5&t=1514