Briggs after action report

Discussion in 'Official News and Announcements' started by Twist, May 2, 2014.

  1. Twist

    Hi folks, Rich "Twist" Lawrence here, CTO of SOE (and yes the same guy from Planetside 1).

    The normal managed message preamble here would be to say you folks "have been dealing with poor login times on Briggs", but let's face it, the truth is it's been pretty much crap for a while. This has improved considerably in the last few days and I thought given the circumstances you deserve an explanation and how we believe going forward it will not be, well, further crap.

    This was not an equipment problem, or simple disregard for Briggs. We use the same high end hardware worldwide - as a rule, we buy simply the best hardware that is available at the time we set a server up. And as for location, what was painful about this incident for us is that we went through specific effort to support Aussies from within Australia directly starting with Planetside, so it was sad to hear people thinking we didn't care. Understandable given the circumstances, though.

    The core issue here is that certain aspects of the Planetside server architecture are centralized, to provide for features like player statistics to compare world wide. Some of this data is associated with your characters, and some of it is associated with your account, some only comes into context for a particular world login, etc. All of the decisions on what to fetch, where from, and what to save are made at the point of login. A considerable sum of data is sent back and forth at that point, in several different directions across dozens or more transactions, and it's important for all of it to be verified before the server lets you into the world. Of course gameplay itself is always resolved locally to a particular server and sent directly to attached clients, but in much smaller chunks and with much less data in a transaction.

    It turns out Australia is a bit down the road from the U.S., distance wise, and there's a fairly large puddle in between known as the Pacific. This makes getting a private link from here to there not feasible, so instead we buy bandwidth on either end from reputable providers, create a secure tunnel, and route along with a bunch of other traffic. Physics and the internet create a fair amount of latency across that connection. Nothing had changed about this configuration, but it started experiencing serious issues.

    The problem wasn't immediately evident and diagnosing it took a long time. We thought at various times we had pinned it down to a connection issue versus a configuration problem, but changes to either over the past few weeks obviously made no material difference. It was also inconsistent in results; sometimes it worked, others not so much. It appeared to be related to packet loss between servers, but all of the networking equipment involved was checked and re-checked many times and was reporting everything ok. Some minimal packet loss is always expected on long haul connections, and so we have a network layer that ensures that everything gets through by resending what is required if necessary. For the network engineers out there, this data is sent via UDP (with a reliable transport built in) instead of TCP, because the various TCP speed limits would hold us back from properly realizing the bandwidth potential of our pipe given the latency.

    One of our server engineers wrote a new custom application that mimicked the server with fake data, so that we could test without being disruptive to the game itself and with different topologies. With a good test case to iterate on, we were able to pretty quickly afterwards come to the conclusion that the interaction of the reliable network layer, the tunnel, and the latent network pipe we had available was causing both a packet loss and a severe loss of available bandwidth, the net effect of which was giving us a tiny fraction of real throughput compared to what we thought was available. The entire chain had worked before, and nothing was completely broken, but the change in time of network load, packet loss, routing and latency over the pipe had tripped a perfect storm of a sort that took out our ability to get high performance from the link.

    To fix this we changed the tunnel configuration, wrote a new caching layer to prevent some use of the pipe, and directly connected the two most important host machines involved via their own tunnel. We tested each aspect and then put the combination up when we already had a system wide maintenance for All Access. We actually aren't done yet, and will be doing some upgrades on network equipment centrally here in the States that seem suspect from the original configuration, despite the fact they report no unexpected errors.

    I apologize for the lengthy time it took to resolve this issue. I could list many reasons it took as long as it did, but I completely get and agree with the perspective that it just doesn't matter when you aren't getting the expected level of service. To address what we consider an unacceptable amount of trouble Briggs players went through, we have as mentioned elsewhere extended All Access status to those most affected.

    tl;dr - Internet slow, reset cable modem and lights all green now
    • Up x 58
  2. Gammit

    Full disclosure + honesty + message from higher-ups = trust and loyalty. Good job in this communication.
    • Up x 40
  3. Nregroepis

    Make sure you learn a bit more of that vocabulary! :D
  4. Statigus

    Well said! I can confirm the horrible briggs in-game lag ended yesterday afternoon too, was it related?
  5. Phant0mSyst3m


    My first post ever around here, but I had to reply to this one.

    As a crisis manager on the top network equipment manufacturer (you know which one), a former network consulting engineer, and with a history of dealing with complex situations on top ISPs, I know exactly the kind of complexity and planning involved in situations like these.

    I know we have all kinds of ages represented on this forum, but I really hope you receive some mature feedback.

    Hope you get this fully sorted. The tl:dr message really got me.

    Cordial Best.
    • Up x 13
  6. sean8102

    I'm in the US my self so obviously I don't play on Briggs, but it's always nice, and refreshing to see a company be blunt, honest, and up front. Well done SOE.

    Also the TL;DR is great!
    • Up x 9
  7. Willpower157

    My thoughts exactly.
    • Up x 3
  8. TurboSquishy

    I used to troubleshoot intercontinental comm lines for government networks and can feel your pain. Midnight phone calls to far off time zones in the quest for blinking green lights are always a blast.
    SOE's professionalism, dedication, and forthright ownership of issues continues to be the gold standard in online gaming. Thanks for all your hard work.

    Very Respectfully,
    TurboSquishy
    • Up x 1
  9. 10thRMDredd

    Login is almost instantaneous for me now.
    Outstanding work.
    I also am NOT crashing every 5-10 minutes at ALL anymore.
    Brilliant.
    The communication from Twist was genuine and intelligent, whilst being in touch with the common man.
    We...who are about to die...(on Auraxis)....SALUTE you.
    SOE is indeed the gold standard as TurboSquishy pointed out. Well put sir.

    Well, nothin more to say SOE except,

    thankyou again for THE best game in the world being good to go.
    It is a beautiful, beautiful thing you have made.
    :)
    • Up x 2
  10. Camycamera

    no need to apologise. this stuff happens. great job in getting it done.
  11. Levtech

    Yay for briggs and SOE saying whats really going on. Even though I have never gone on Briggs, it must've been 10x more annoying then when I complain I have 1000 ping so I'm glad its fixed. Also, I like how this goes from latency for dummies to ISP stuff you cannot understand unless you work in that field.
    • Up x 1
  12. CartoonScience

    Thanks for this message folks, really good to hear the details.
    Just for this, when i log in: I'm going to buy something.
  13. VedaBug

    SOE you don't need to apologise, after going through your support services a few times i can easily say your team is one of the most community focused large gaming companies around. You would never get this level of support from EA (Burnout Paradise Ultimate Pack bundled items permanently locked grumble grumble)

    PS - logged in instantly, thanks!
    • Up x 2
  14. PandaCammo

    Now here is my question to you. Is there a way to balance the server population so it's not 40% NC, 30-40% TR, and 20% VS? And yes, I would say the numbers listed are pretty accurate.
    • Up x 1
  15. Bindlestiff

    Great to hear the Briggs folk can get back to fighting the good fight :)
  16. Necromantia

    Wish you guys had this openness when it came to the 64 bit client fiasco, I probably would of still been playing. Goodluck.
  17. RedShirttBeta

    Good job SOE. Sounds like a mofo of a problem to diagnose let alone fix. Maybe you should CONCENTRATE YOUR EFFORTS in reducing the distance between the USA and AUS. Maybe a big cable and winch to drag us together. LOL. Us AUSSIES are used to crap internet and the pacific. Hardly your fault. Thanks for the honest after action report. I too will purchase something in game cause I know you get noting til we buy something.
  18. kungflu

    Honestly, aussie servers had always been the worst after renting a lot of game servers. SOE could only take part of the blame for wasting resources and renting a server at the worst location.. No puns, it's just fact..
  19. Rebus

    This is great to read and the 3 month membership is received very graciously, however, not wanting to be a hypocrite as i posted complaints about the situation, if this kind of communication had been ongoing and frequent throughout "Loginside" i probably would not have posted and i imagine plenty more would not have posted any complaints.
    • Up x 2
  20. XinniX

    [IMG]

    Now im not usually a huge meme poster. Buut Ill make the exception here. I enjoy this game, ive probally spent mroe on this game more than any other. Had faith it would be solved at some point. Thank you SOE. FOr the openness and just the general being helpful.

    THanks
    NIx