Briggs (AU) Server Issues Resolved

Discussion in 'Official News and Announcements' started by Veratu, Nov 23, 2012.

  1. Veratu SOE

    First, I want to thank you on behalf of the entire tech, dev, and CS team at SOE for being patient with us tonight, and over the last two days while we worked through the Australian server problems.

    I want to iterate what transpired in one spot so it's clear what occurred here. Before I do, let me re-state that every region that PlanetSide 2 runs in is configured, designed, and built the same from a hardware and infrastructure perspective. There is no one region that has a "better" setup than another. The only thing that differentiates them is physical locations in the world.

    That being said, this deployment into Australia was definitely a unique experience. You guys have some very unusual laws and limitations that we simply don't see anywhere else in the world. (ISPs cap your bandwidth and download amounts? That's crazy! I personally will be investigating ways to make that better, if it's possible, with regard to our games.)

    Now for the summarized details.

    Day 1:
    On launch day we had a 2 fold problem, one that was specific to Australia and one that was worldwide. The first was a server/client bug where characters were getting stuck if they made it into the world and logged out, or crashed out, preventing them from logging back in. You would press play and nothing would happen. This one took a while to figure out, and we finally had a code hotfix on day 2, thank you to the people who let us use your "stuck" characters to solve this one.

    The tricky part about this client/server bug is that the symptom for the 2nd problem which was only affecting Australia was the same as the hotfix bug, so diagnosing this second problem was even more difficult. Click play and nothing happens, or click play and it takes forever and then lag is horrible or unplayable. At first we thought it was part of the first problem, but then we realized we had a configuration issue on our side and so we adjusted that. Since the first problem was still occurring because the hotfix hadn't been deployed yet, it was difficult to see if our configuration change really fixed problem 2, it looked like it was improved but it was just too hard to tell for sure. So we waited for the hotfix, and then retested on Day2.

    Day2:
    After the code hotfix was deployed and our configuration change was in place, we saw excellent results when the population was low. This lead us into a new set of problems once population started to rise around your peak times. Our provider was sustaining heavy packet loss, and it impacted logins, character creation, and general game play. People were warping, lagging, and disconnecting in mass. Users were taking 30 seconds or more to get into the game, in many cases it took 2-3 minutes. We got the provider involved and it took several hours but they finally made some changes on their end and our problems went away (or so we thought). The game looked good, populations were high, and we thought we were in the clear. My team called it a night, and within a couple of hours something on the provider side changed, and we had a new set of problems, but by this time populations had dropped because the Australian folks were going to sleep and so even though something changed we didn't see it until Day3 when we woke up and reviewed all of the log data.

    Day3:
    We noticed something had changed on the provider side a few hours after they fixed the packet loss problem the night before, so that's when we decided to stage the diagnostic tests and bring all of the SOE people together to get this issue solved once and for all. We were confident the code hotfix from day 2 was good, as no other region was experiencing the issues anymore, so we knew whatever had changed was most likely the cause, but we didn't know for sure what it was. When populations were low we couldn't produce the problem, so we had to wait until prime time when populations were high enough to show the issue and the data matched our logs.

    Sure enough, your prime time started (and the SOE staff was recovering from food overload on Thanksgiving), and populations were up. We gathered data and working with a very active Aussie community on our forums we had all the evidence we needed to show that something on the provider side had changed. After supplying the data, and working with them they found the problem and rectified it. Once they made the change, all of our network issues went away, and the game immediately started responding how we expected and like every other region. Login times were 1-5 seconds, and within minutes populations had blown to record levels on Briggs. The SOE staff also loaded in and jumped all around the world and things were running beautiful. We witnessed fantastic battles and the game experience everyone else in other regions were experiencing.

    This was a 3 day exhausting diagnosis, and it was a mix of a code bug, a mis-configuration, and 2 separate provider issues.

    I'm happy to report, at this time, that Briggs is officially performing at SOE standards, and that any Australia related issues on that world are isolated individual cases. If you are still having problems on Briggs, please submit a ticket for assistance, and CS will work to help get you in the game. Assuming we don't have any ninja-changes while we are all sleeping tonight, I'm looking forward to seeing what your prime time produces tomorrow for populations.

    Thank you again to the community and all of the data that you guys provided, we couldn't have done it without you. You all helped make the game better for everyone worldwide.

    Now get in there and have some epic battles!
    • Up x 83