photon statistics to external system/graphite

Options
shashi
edited September 2014 in Photon Server
Hi,

We have a lot of photon game servers, and it is going to grow more in future very soon. What we want to do is get all the statistics of the games to an external system like Graphite to get a realtime multi-server statistics and save those statistics for a long time, and also cross-graph the statistics to other metics ( number of sessions on web server, database load etc )

I setup a python socket to listen on port 40001, but the data I get is not in ascii but some other encoding.

So how would I be able to get/parse those statistics ?

Thanks,
Shashi

Comments

  • Hey,

    nice idea. :)

    For a bit of background information, I would recommend to read these articles (but I guess you have done that already):
    http://doc.exitgames.com/en/onpremise/c ... -dashboard
    http://doc.exitgames.com/en/onpremise/c ... ustomizing

    So you are basically replacing the Photon Dashboard service with your own implementation?

    The messages are sent in a binary format; below is a (slightly abbreviated) .NET code sample to parse the data.

    [code2=csharp]private void OnReceive(object sender, SocketReceiveEventArgs e)
    {
    DateTime timeStamp = DateTime.UtcNow;

    using (var memoryStream = new MemoryStream(e.Buffer, e.Offset, e.BytesReceived))
    {
    var reader = new BinaryReader(memoryStream);
    reader.ReadUInt16(); // int magicNum = 0xffee
    long binaryServerTime = reader.ReadInt64();
    DateTime serverTime = DateTime.FromBinary(binaryServerTime);
    int count = reader.ReadInt32();
    string senderId = reader.ReadString();

    for (int i = 0; i < count; i++)
    {
    var counterSample = Deserialize(reader);
    // do something with your counter sample here
    }
    }

    public static CounterSampleCollection Deserialize(BinaryReader binaryReader)
    {
    string name = binaryReader.ReadString(); // counter name
    short valueCount = binaryReader.ReadInt16(); // number of counter values

    var sampleCollection = new CounterSampleCollection(name);
    for (int i = 0; i < valueCount; i++)
    {
    // each counter value consists of a timestamp + the actual value
    long binaryTimeStamp = binaryReader.ReadInt64();
    DateTime timeStamp = DateTime.FromBinary(binaryTimeStamp);
    float value = binaryReader.ReadSingle();

    sampleCollection.Add(new CounterSample(timeStamp, value));
    }
    return sampleCollection;
    }[/code2]

    CounterSample + CounterSampleCollection are simple structs / classes, they are also part of the ExitGamesLibs.dll that can be found in the lib folder of the Photon Server SDK.

    --

    Another idea:
    Photon writes lots of statistics to Windows Performance Counters, so you could also build a service that reads the Windows Performance Counters and send them to Graphite - maybe by using this lib:
    https://github.com/peschuster/graphite-client

    Either way might be a bit "tricky" and requires a bit of effort, but I hope it helps. I'd be glad to hear about your final solution; and please let me know if we can help with anything else. We are aware that statistics & counters need some improvements and are always happy to hear your feedback. :)
  • Hi Nicole,

    Thank you for the quick reply. Based on your answer,


    #1
    one way to approach this is to create a small c# application, which will:

    a. the application will listen to some UDP Port , and listen to the messages
    b. when messages arrive, it will use the code you have provided to decrypt the message from binary to plain-text
    c. the application will send the message in plain-text to some external IP/PORT.

    The downside of this is that it will require one extra process running on all servers, which is publishing the counters.

    #2
    A better way I see is to tell the CounterPublisher to send the messages in plain-text and not binary.
    Is there a possibility for this? a patch/config-option/beta build ?


    I would be more than happy to write a detail how-to for all to setup graphite and do what I am trying to do if its a possibility.
    All I need for now is to get the counters in text format. I can do the parsing into graphite format myself. When that works, on your next release, you can simply setup a config option to include graphite_host, graphite_port and send data in text in the format that graphite needs. (hostname.counter_name, counter, timestamp)

    This solution would really be of a big help to those who are running and those who plan to run dozens of servers and understand what is going on and keep a better eye on monitoring and scalability.

    Waiting to hear your thoughts.

    Cheers,
    Shashi
  • Hi,

    thanks for your feedback.

    #1 Yes - that's how it could be done and what I would suggest for now. And I agree on the downside as well.

    #2 Currently no, sorry. We are discussing this and plan to work on statistics in the next time, and we MIGHT remove the binary format and replace it by a plain text format, but we have not decided on this, and there is no ETA in the near future, so if you need a solution now, we can provide much more help, sorry.

    I understand your requirements and we are facings similar challenges (and are not that happy with the current implementation either), but I don't want to promise something yet. Hope you can understand that.
  • Hi Nicole,

    I have been able to get the stats, and have some followup questions

    So far, I see these metrics:

    MVGameServer.AvrgOpExecTime
    MVGameServer.EventsSentCount
    MVGameServer.EventsSentPerSec
    MVGameServer.InitPerSec
    MVGameServer.OperationsFast
    MVGameServer.OperationsMaxTime
    MVGameServer.OperationsMiddle
    MVGameServer.OperationsSlow
    MVGameServer.OpReceiveCount
    MVGameServer.OpReceivePerSec
    MVGameServer.OpResponseCount
    MVGameServer.OpResponsePerSec
    MVGameServer.SessionCount
    SocketServer.BusinessLogicQueue
    SocketServer.BusinessLogicThreads
    SocketServer.BytesInPerSecond
    SocketServer.BytesOutPerSecond
    SocketServer.CommandsResentPerSecond
    SocketServer.Connections
    SocketServer.DatagramValidationFailuresPerSecond
    SocketServer.EnetQueue
    SocketServer.EnetThreads
    SocketServer.IOThreads
    SocketServer.MessagesInPerSecond
    SocketServer.MessagesOutPerSecond
    SocketServer.Peers
    SocketServer.PolicyFailedRequestsPerSecond
    SocketServer.PolicyTimeoutDisconnectsPerSecond
    SocketServer.ReliableCommandsInPerSecond
    SocketServer.ReliableCommandsOutPerSecond
    SocketServer.TcpDisconnectedPeersByClientPerSecond
    SocketServer.TcpDisconnectedPeersByManagedPerSecond
    SocketServer.TcpDisconnectedPeersByServerPerSecond
    SocketServer.TcpDisconnectedPeersByTimeoutPerSecond
    SocketServer.TcpDisconnectedPeersPerSecond
    SocketServer.TcpPeers
    SocketServer.TimeoutDisconnectPerSecond
    SocketServer.UdpDisconnectedPeersByClientPerSecond
    SocketServer.UdpDisconnectedPeersByManagedPerSecond
    SocketServer.UdpDisconnectedPeersByServerPerSecond
    SocketServer.UdpDisconnectedPeersByTimeoutPerSecond
    SocketServer.UdpDisconnectedPeersPerSecond
    SocketServer.UdpPeers
    SocketServer.UnreliableCommandsInPerSecond
    SocketServer.UnreliableCommandsOutPerSecond
    System.BytesReceivedPerSecond
    System.BytesSentPerSecond
    System.BytesTotalPerSecond
    System.Cpu
    System.CpuTotal
    System.Memory

    Questions:

    1. I see multiple metrics being sent per line, and sometimes they repeat ..
    For example:

    System.CpuTotal 9/18/2014 12:12:49 PM 59.67742 9/18/2014 12:12:50 PM 62.50016 9/18/2014 12:12:51 PM 60.71428 9/18/2014 12:12:52 PM 58.98438 9/18/2014 12:12:53 PM 57.69231 9/18/2014 12:12:53 PM 56.45161 9/18/2014 12:12:54 PM 50 9/18/2014 12:12:55 PM 62.10937 9/18/2014 12:12:56 PM 57.42188 9/18/2014 12:12:57 PM 49.60937 9/18/2014 12:12:58 PM 61.90476 9/18/2014 12:12:59 PM 53.84615

    #1
    Is it per CPU, or average ? .. and I see the same being repeated in the next line.
    What I am trying to do is decide which one to take, so that I can only take one metric and push to graphite. So my question is how are the metrics being sent ?

    #2
    I was not able to see number of rooms and number of players per server. Is it Peers or Connections or its not in the list ? A lot of metrics are self-explanatory, but I would like to know the difference between the less obvious ones .. like peers and connections.

    #3. Since I see a lot of repeated data, if I want to say have one data per second from one topic, is it a good idea to discard the rest of the data in the line and take the first one only , or would I be losing data in that way ?


    Thanks,
    Shashi
  • I am able to insert all stats into graphite with my custom parser. Finally, we can build a dashboard that can have like sessions from all servers, CPU usage from all servers etc :) ..and compare like cpu usage vs sessions etc etc . Lots and Lots of possibilities.

    For example, this shows the session count from 3 selected servers.

    Screen_Shot_2014_09_19_at_09_51_58.png

    High Quality Image: http://s2.postimg.org/vyjstipyh/Screen_ ... _51_58.png


    I will write a detail copy-paste howto on how others can implement what I have done, so that others facing issues with statistics and monitoring will benefit too.

    I do have some questions, as a few more bugs remain: This one for instance: I see SocketServer stats from some servers, and not from others. Is there any special setting that might cause a game server to send NOT the SocketServer stats but just MVGameServer stats and custom stats ?

    Cheers,
    Shashi
  • Hey,

    sorry for the late response here. Awesome project!

    For the SocketServer stats: have you verified that the counters show up in the Windows Performance Counters on the affected machines? They should be in the "Photon: Socket Server" category. If there is no data written to the Windows Performance Counters, there is nothing to publish.

    Just make sure that you have installed the Photon Performance Counters before you start Photon (either from Photon Control -> Perfmon Counters -> Install Counters; or per command line: photonsocketserver.exe /installCounters).
    Also make sure that you install the counters from the correct bin_xxx folder - is ist possible that you installed x86 counters on a x64 machine?

    If the values show up in the Windows Performance Counters and not in your custom service, I would look at:
    - firewall settings
    - correct counter publisher config (correct address set for your service?)
    - any errors in your log files?

    Hard to tell what's going on without further details - but I hope this gives you a hint where you could look at. Let me know if we can assist with anything else - and I would love to see more of your implementation :)