AirVM Slow Connect - EC2 Alternative

[Deleted User]
edited February 2011 in Photon Server
I've been using Amazon EC2 but wanted to look for alternatives. One that looks promising is:

AirVM
http://www.airvm.com/configure.php

They even have a free trial so it is very easy to evaluate, the customization options are nice, it is simpler than EC2 & the price is potentially better.

With all the settings at lowest on a Windows 2008 server, I think it could handle 100 players. Maybe more with some optimization of my code. So that is less than half the price of an EC2 small instance (EC2 micro instances are not suitable for Photon, not enough RAM & CPU is unreliable).

Now for the problem...

LitePeer.Connect with an AirVM server blocks the game for a good 4-5 seconds, everything appears to be frozen. Then it finally connects and works wonderfully after that.

I would guess it is a DNS issue, but they give an actual public IPv4 address. I opened port 5055 for UDP on the windows firewall. I know it took effect because I couldn't ping the ip before, but after turning off the firewall I could ping the server (<60ms stable ping).

So I have tried from several different networks, and had friends try from locations around the world with the same result.

The one interesting thing, the connection lag did not happen when I tested it from my Macbook Pro (Leopard). I can't think of why it would be any different.

I currently have the AirVM server running as "US West":
http://www.nplay.com/BeGone

I contacted AirVM support last night, and they responded within minutes somehow asking about the protocol, application, firewall settings. So they are quite responsive with customer service, but I am not really sure how to debug a problem like this.

Any ideas?

Comments

  • AirVM sounds definitely interesting. Cheap and responsive support could be a winning combination :)

    The 4..5 second block sounds bad. Currently, I don't have any idea what could cause this. Firewall or misconfiguration usually blocks connections entirely.

    We will try to reproduce and solve this issue but it might take a few days.

    Which setup did you chose at AirVM?
  • Windows Server 2008
    32-bit (x86)
    CPU = 1
    MHz = 500
    RAM = 1
    Production Storage = 25
    Backup Storage = 0
    IP Addresses = 4 (1 Usable)
    Bandwidth = 250 GB/mo


    With 20 people connected, CPU was low and there was free RAM. Fire alarm at work... guess gotta go out in the snow now!
  • Fire alarm? I hope it's nothing serious!
    Considering the alarms in our office, it's a 100% false- or test-alarm rate.
  • False alarm :P

    I'll probably try Windows Server 2003 & 64-bit variations later this evening. And re-test on the Mac, seems odd that the problem didn't happen on Mac.
  • 4-5s sounds like horribly long. Can't imagine even a bad interface to take that long, not even amazon ec micro instances during "overload times" on amazon take that long

    You didn't configure the VM (given they allow you) to sleep when no connection is there, did you?
    Otherwise that might be the time to unsleep your VM potentially to bring it back online upon connection request.
  • Does it block all existing connections as well or just the new connection?
  • I don't think the VM is sleeping. I am connected to it via Remote Desktop watching Perfmon as I am trying to connect (so I know the CPU is staying low).

    After that 4-5 second lag time when connecting, it runs smoothly. Players that are connected to the server do not experience any lag as other people are trying to connect to the lobby.

    I didn't have much luck with the Windows 2003 versions, 2008 seems better anyway. The 64 bit Windows 2008 seems promising but I can't quite get it working yet, going to try more testing tomorrow. Right now it is complaining about not being able to contact the license monitor, but I updated my server code to 2.4.0 sdk tonight so I may have done something wrong although it works locally. Also the firewall settings look different, not even sure where port forwarding is anymore, so gotta hunt that down.

    I also upped the CPU to 1.0GHz and the remote desktop seems more responsive. There just needs to be more hours in the day :)
  • Still 4-5s is massive ... especially if its just startup.

    The 500mhz on its own shouldn't cause such a bomb if the server is running idle with no players and alike.

    As for 64bit: did you ensure to copy over the config from 32bit to 64bit folder before you started it so IP is are right? Also with the move your firewall settings on the OS likely were reset.
  • Thanks for sharing your AirVM experience Doug. As soon as I get the time I want to check out AirVM, too.
    I think I would go for 2 or more cores instead of just one, probably even 2x500 over 1x1000 if I had to choose either, ideal would be 2x1000.
    I don't really believe though that this has anything to do with the 4-5 sec delay since everyone else is not getting it at the same time. The weird thing is that after such a long time I'd expect photon to drop the connection(s)... so maybe a longer wait time somewhere inside the managed code?
  • I wouldn't think so
    But that reminds me of something else: perhaps photon tries to ping the server or alike on the initial connection (or wise versa) and the actual login handling isn't executed until thats done or it failed.
    in that case the ~5s would make sense depending on firewall settings
  • Thanks for fixing the connection bug.

    How you ask? Well... I haven't updated Photon since June 2010 (under the philosophy of if it ain't broke don't fix it :P).

    So I dropped in the latest dll, fixed a few references and bam, the connection lag is gone!

    Thanks for the multi-core tip Boris, I'm looking forward to narrowing down what my bottleneck is. I'll follow up with my findings in case anyone is curious.
  • Hehe. You're welcome!
    And we are curious to hear about your bottleneck analysis, too.
  • I've started to collect some stats using the dashboard, and they are pretty interesting.

    The AirVM configuration for this is:
    64-bit Windows 2008
    CPUs = 2
    MHz = 1000
    RAM = 2

    I'm really impressed with how stable the CPU usage is, I am not sure why there are peaks in the 'CPU Total Usage %', perhaps something to do with having 2 cores.

    I'm very surprised that the 'Bytes Out/Sec' is so low. With 12 players per room, I expected the out to be about 10 times the in. Maybe there is some kind of Photon magic going on.

    The one thing that concerns me is that 'Commands resent/sec' seems a little high. I do use reliable packets pretty heavily. In fact I just checked and all of them are reliable, so I think I could find some to change to unreliable. The other thing is that I am only using a single channel for everything, problem.

    Anyway, this dashboard is pretty sweet :D

  • After optimizing which events need to be reliable and separating events into a few different channels:



    Much Better :). And I believe the difference can be felt ingame (less people popping around).
  • The Dashboard service generates updated pictures every few seconds. This is a CPU intensive operation, so if you run the dashboard on the same machine it might explain the CPU spikes.
    That the outgoing bytes are not as high might indicate that your clients resend a lot or that they call many operations that don't broadcast events.
    Since you already see many resends from the server it could be that your clients have trouble receiving the acknowledgements as well which would cause resends from the client.
  • I just tested AirVM and I am thrilled about how well they perform. Big thumbs up!
    The performance is actually almost as good as my non-virtual 4-core at home.

    For just 37.1 cents per hour you get 4x the performance of a amazon medium high cpu instance which runs at 31 cents per hour. You can forget the extra large high cpu instances, they do only deliver twice the performance of a medium instance (in theory 4 times, but cpu and network just freeze every few seconds) and cost 4 times as much.

    Having said that I had to open a support ticket because I initially experienced very instable connections. It turns out that they do currently have a mitigation appliance in front of their network due to a massive DDoS they were targeted with lats week. After they adjusted the policies for my server everything worked as expected. They were really fast in answering the ticket and solving the issue by the way.

    So if you do host photon with AirVM (which at this point I would recommend over Amazon EC2) and you notice that you can't serve more than 47 stable connections you might want to ask their support team about it.

    You are probably interested in hearing some numbers as well...
    The test I did was the following: All clients connect to Lite, 2 peers per game. Then every client calls raise event every 60ms (unreliable) and every 5 seconds another reliable "flush" raise event to measure the time an event needs to arrive at the other client without the configured server send delay. I hosted the clients on 10 amazon medium high cpu instances. At the peak of the test every instance had a CPU load of more than 50%.

    So on AirVM 4-core, win2008, 64bit, 2ghz with 1400 peers photon was at 87% cpu, network throughput was 5.6mb/s. 1600 peers worked too, but the event rtt increased by 40ms.
    Same settings on amazon ec2 medium high cpu allow 360 peers. Did not test again on extra large, but I assume no more than maybe 750.
    On my home server 2.4gz 4-core it's 1800 peers, so 1400 on AirVM is really good.

    Again, this test was with 18 raise event operations per second per client.. a pretty high frequency and therefore much more CPU intensive than most games will need it. 10 raise event per second allowed over 2700 peers on my home machine a while back, so I assume AirVM can do at least 2100. Hope I find more time for testing soon.
  • Thanks for the stats there :)

    And yeah, Amazon EC2 has some serious problems with their craptastic networking, from outside but also cloud internal.
    Had my fight with it in 2010 for a customer that used Photon on his projects. The problems were due to unrealiable pings (jumping from 100 to 400ms ingame while direct ping requests showed 40ms and alike) and its a real hell to compensate for such stuff if you can't do any curve or physics extrapolation.
    Aside of that they cost just too much, you pay for instances that don't deliver what you pay and traffic that went out at far too high and instable pings for the price and promises.

    I ended up going the opposite way introduction artificial delays so it was "a bit behind" all the time so the latency jumps would end in "nothing happening" instead of "the wrong thing happening".

    So good to hear that AirVM seems to not be like this (or not yet at least).

    For something as Photon which does not directly scale across a cloud but scales with the cpus, I would actually visit http://g-portal.de/ and their "mega bomb per buck" machines. Sure they don't cost you per used second, but on the other hand they offer more than amazon and AirVM offer in any form when it comes to power and when you really use it, at a much lower price. (phenom x6 with 8GB RAM at 100 EUR a month with gameflat traffic)
    I've used it with a customer in the past and their pings and hw power, price and latency is hard to beat without co hosting something yourself.
  • wow.. thanks for the link dreamora. A good choice if you are ok with hosting in Europe, don't need a new instance in a few minutes and plan to keep it for a few months. On the down side I don't see that they offer win 2008, just 2003.

    i do see one problem with airVM now: if my calculations are correct a server that runs at 6mb/s average will use 16000gb bandwidth per month. That gives us just about a week for then $490...
    Usually you will have load peeks during certain times, let's assume you have 3mb/s average, that is still just 2 weeks. I guess you will end up using a smaller instance (e.g 2x2000 ghz) to stay within bandwidth limits.
  • well. good news for that too. I chatted with their support team.
    If you happen to use more than 4TB you can upgrade for $15 / 250GB.
    They contact you when you have reached 90% of the allocated bandwidth.
    Overages are billed at $0.25/GB.

    By the way: 16TB at amazon costs 10 to 15 cents per GB.
    So 16TB at AirVM cost $960, at amazon almost $2000.
  • Bandwidth is extremely expensive at amazon yeah. But its obvious why they need it, if they don't force you to optimize it and use as little as possible, their situation will be even worse than it already is ...

    So thanks for the update on the matter, definitely good to know I guess :)
  • Great information Boris. I especially thank you for that info on the DDoS mitigation appliance. They increased the thresholds for my servers and users have reported an easier time connecting.

    @Dreamora: g-portal.de sounds interesting. Thanks for the tip.