Massive tell, channel, and zoning lag on AB server

Discussion in 'Player Support' started by Adroxia, Feb 19, 2017.

  1. Siny Augur

    We have a very different raid week than you. We have been in farm mode for a very long time now and the raid week accordingly reduced to 2 days a week (Sundays/ Mondays). Unlike you, Pikeys never raid Fridays/Saturdays even when we are doing prog.
  2. Asmadeus Journeyman

    Yeah, but last Friday although AD was the only guild raiding that evening the performances were as bad as ever, and the troubles happen even when all the raid is just sitting there afk waiting for buffs to cast themselves so you can't really say we're pressuring the servers. 50 idling folks more or less really shouldn't change anything about the servers load, unlike when folks are actually fighting/spamming abilities (and Bertox knows how much EQ is bad at handling 25-30 DPS using AEs on as few as 10-15 mobs...)

    On the other hand I couldn't bring myself to logging in on our off nights, but I'm told by folks who did that things apparently were running smoothly.. so is AB really in such a bad shape that 40 raiders logging in after work put the servers on its virtual knees?
    I don't get it, but I don't like it. All raids since last patch have been terribad so far.
  3. Vaako_SK Lorekeeper

    Raiding or not, zoning takes forever/ random crashes/ "unknown error trying to join the server" / chat lag in every kind of channel/ group invites lags/ accept rezzes lags/ if you click a door, gate / campfire to another zone you can actually cast invis before it zones because of lag.. I would think the amount of money we throw at you would be enough to hire a programmer who has some actual skill???
  4. Zhaunil_AB Augur

    /uptime is not actually true?
    So you're claiming they fake that?

    Think about this:
    The cluster that is AB runs the zones/instances.
    Every time you zone or chat another server (let's call it login-and-chat-server for lack of actual knowledge of their architecture) gets sent a request from AB.
    login-and-chat-server is in the US, AB in the netherlands.
    Due to whatever they did end of last year (and was made even worse last patch), calls to login-and-chat-server and/or back to AB are lost, and as a result you are disconnected.
    Result:
    AB still fine but you get disconnect as if a zone had crashed, from a end-user point of view.
    If you are not alone, then in the extreme the whole server-population is disconnected.
    While the zone, ran by AB is idling now...
    And because people are far too simple-minded they shout "servercrash", simply because that's how it appears to them. Even though it isn't.
    Cannot be true?
    Really?

    Right.
    AD wasn't "spamming the server".
    And yes, 50 idle players shouldn't (and don't) change anything (noticeable) about the server's load.

    As anyone, i can only go by observations and what little they tell us.
    "Location matters" they said.
    The more i look at it, the more convinced i am that the problem we are experiencing (massive chat lag, invites/zoning-requests and disconnects - all happening in bursts) aren't *ON* AB itself, but some off-server they have "integrated" over time and overdid it.

    Granted, this is a wild guess:
    but i think that due to the pick-zones and such, they moved "load-balancing and instance-handling" off server onto some central system, freeing up resources for the additional zones.
    Perhaps even the determining of /say range and such, or the server-side filtering of channels.
    (would explain why people CAN be joined to chat-channels of other servers, for example - if they were all handled on AB itself, i would not be able to be moved from raidchannel to some weird bertox channel).
    And if so, then THAT system isn't just affected by you 40raiders at the time, but by ALL chat that is going on.
    It would explain so much of what we're seeing...


    "running smoothly" is so relative...
    People get used to "a little" chat-lag really fast and consider it "smoothly" over time, when comparing it to much worse situations.
    There might have been no or very few disconnects nor any of the other more "serious" symptoms at the time.
    But no, AB hasn't run "smoothly" ever since the patch.
    You could notice, more than ever, how chat comes through in bursts.
    Chat was "ok" (delay noticeable but bearable) before noon, and got worse as time went by - as the US players woke up too...
    This led me to believe that there is some issue with some queuing system they use that's the "real source" of the problem.
    And that they overdid it with the centralizing (the design they spoke about that makes it so that now "location matters" in our digital high-speed age).
  5. tanith Augur

    /snip

    No idea. Am not a techy. I know I had 3 toons in in different zones. After the 'crash' or whatever it was I had none.
    Everyone else in guild (assorted zones), GGH, Lobby and bazaar was also booted.

    but server was probably still up (?) as vendors had stuff that had been sold to them, not just their own wares.

    So don't know what the correct term is for the mass disconnect, but whatever way you slice and dice it, it means we haven't been able to play

    Instead of 3 raid nights our guild managed about a quarter of one this week. It's not looking any more positive for Sunday :(
  6. Zhaunil_AB Augur

    You won't hear any disagreement from me on "we haven't been able to play".
    In fact i have been thrown off server for the third time today, once costing me a hunter-update...
    (server's been up since TUE)

    But that wasn't what you said in the post i quoted above, so i felt obliged to clarify or at least give a possible explanation to the "can be".
    The situation is bad enough as it is.
    So let's not exaggerate it - there is no exaggeration needed, nor any "conspiracy theories" to the end of "they lie to us".
    It's by far more helpful to "the cause" i think if we stick as close to the "truth" as we possibly can.
  7. Asmadeus Journeyman

    I honestly don't buy the "location" theory. Yes there's communications with other servers, and yes there is some problem with the behind the scenes connections, but no matter how bad their backend connection is we're talking about this doesn't make sense to have everything coming out in burst as things are.
    And when things are "smooth", yes, you can notice chat channels message do go through the US and take half a second maybe, but really it's not that bad for a game like EQ - I could live with that kind of lag forever without any need to change anything.

    The way that *everything* is held off in bursts like that indicate that there is only one thread handling everything going to the US, that it sends the requests and waits for the reply before handling the next request.
    For all I know, it could be that a login request takes 5-10 seconds for some stupid reason (login is always tricky and often needs multiple round-trips hence the convenient idea that location mtters). So that everytime someone logs in the whole server freezes for 5-10s, so at peak hour when folks log in in burst everyone have to wait for that. I'd bet there only is one slow request that brings everything else down and they'd just need to find which.

    There are half a dozen ways to fix this kind of problems. Most aren't easy, but it's not something that'd take six months either.
    For example, figuring which request is slow isn't easy if the code has no logging/instrumentation, but at least confirming that's the problem takes about 5 minutes without any knowledge of the code (there's something awesome called strace!), and once it's confirmed adding logs can take anywhere from a few minutes to a few hours, then kick everyone out one more time to restart it (we won't notice the difference), and once you have logs it is also shouldn't be more than a few hours to figure what's so slow. Once that's figured, either fix the reason it's slow or just hand off that request to another server so the "lag" will be isolated to a single function we can probably live with being slow.
    Another way to fix that without spending time debugging would be to just having more threads handle requests so as long as there is one that isn't stuck the server would not appear to be stuck. You don't need to change the protocol to match replies with requests if you have a different connection per such thread. That's more work, but it's still easy - let's say a couple of days of work including testing.
    Or they could be more ambitious and rework their main loop to be asynchronous, so it could send requests then deal with the next requests and deal with the reply whenever it comes. Now we're starting to talk about more work, as you need to be able to match which replies came from which request, and if it's not currently possible it's going to be painful because the server needs to be able to talk with all EQ servers and possibly other games (although I haven't the faintest why such a central piece of architecture needs to know about raid tool being locked/unlocked)

    Given what we've been told I would be inclined to assume they took the most difficult approach of the ones I listed, so they can't do anything to AB on its own, so basically now they have added the first step we can only wait until next patch for another incremental update but.. meh, we don't need the best; this used to work so a quick fix is all we ask for. Can always take your time doing the deeper, more proper fix once the fire has died down.
    heruthemonk and Yirrara like this.
  8. Gaol Journeyman

    I'm baffled that the de facto position of Daybreak is that it's okay to allow their servers to run in this appalling way and to still this it's okay to expect everyone to pay for it. The server does not work and they've admitted they need to fix it.

    Where else would you be expected to continue pay for a good or service which the provider admits is broken or does not work as advertised?

    It's astonishing to me that this position has been allowed to prevail for so long. Perhaps the half hearted offer of a one-way server move to vox was simply to get them out of any potential legal issues they'd have if challenged.
    Fortunbas and heruthemonk like this.
  9. Vaako_SK Lorekeeper

    Bored again morrok?

    People report what they see, thats all we can do really because noone here, tech savy or not knows for sure whats going on, neither does dbg it seems..

    All we can do is report what we experince that worsens our gameplay, the dissecting will be for dbg to do, or whoever they choose to do it, in the meantime lets make sure they know we are not happy!
    Imrahil and tanith like this.
  10. Thancra Loladin

    The server doesn't restart, all the zones are still up, it's just that all the characters get disconnected. The server never really "crashed".
    So we don't even get new nameds for our misery for example...
  11. zhay Journeyman

    so if all players get disconnected or you can't log on to play - server is down - it might have all the bright green light flashing - but it dont work

    anyone who have the slightest interaction with users and customers computerwise will know that - you can say so many times your server is up and running - but from the perspective of those who use it or need it or pay for a service it is down

    zhay
    tanith likes this.
  12. Zhaunil_AB Augur

    I am quite sure they know quite well what's going on.
    Just not exactly how to fix it within their budget or the design they've committed to.

    And this is exactly why it's helpful for us to EXACTLY tell/describe what we're seeing instead of using ambiguous terms or generalizations.
    Even for the "tech unsavy" a server-crash is (or should be) something different than "disconnected".
    If we report a server crash, they (might) check server status, see it up and dismiss the post.
    If we instead report what we are seeing - disconnects - then they (might) look further than server status.

    Sure.
    Sad that we can't simply leave guild here, isn't it?
    I think they know quite well.
    22pages and counting in this thread alone, plus about 4other ones on the topic speak quite loudly.
  13. Vaako_SK Lorekeeper

    ahahaha see what you did there!! makes me all warm and fuzzy to see you care :)
  14. Jumbur Improved Familiar

    Actually, I do think they know how to fix it too, They probably have a detailed roadmap on how to apply the fix, and have already taken the first step.

    The problem is that the "first step" made it much worse, this might not be by accident, but rather a result of their roadmap.

    My guess is that the fix involve updates to a lot of different key-components, and that the first fix updated one of them, and that the new component does not work well with the old components that are planned to be updated in future patches. Once the rest of the old components(or some of them at least) has been updated, we will see a significant improvement, I hope.

    I also hope the next hot-fix will come soon, and that it will show some real improvement, and that we won't have a server this unplayable until the very final component has been updated.

    I am not worried about DBG not being able to fix AB for good. I am worried about the community being driven away before they are done with the fix.
    Fortunbas likes this.
  15. Zhaunil_AB Augur

    This.
    And that's why, if it's taking them "too long" to fix AB, a shadow-copy of AB on some US environment should at least be considered by them as a plan B.
    "Too long" in this context means to me "shortly before the next expansion's release".
    Because if we will be facing the same issues then that we see now, i do not see any of us progressing;
    at least not as we normally would.
    Fortunbas likes this.
  16. Kyzvs Augur

    Not had much time to play last fortnight or so - new collie puppy taking all my time :D

    However, the last three play sessions I've had in the past few days have been terrible - 30mins+ to get 3 toons logged in and into a LDoN. Just had all disco on zoning in, only to find on relog that one is no longer in the adventure, all pets from the group have vanished - this is unplayable, so I'm going to pour a glass of red, sit in the evening sunshine and count the pennies I'm not giving to DBG.

    Never mind the actual problem, this is woeful communication guys - textbook example of how to not handle issues.

    Will happily resub once AB has been proven to be stable and playable!
  17. Asmadeus Journeyman

    Some here are still trying to progress... And tbh since last patch this isn't even a problem of being able to progress or not, things have been taken a step further - just as we thought that it couldn't get any worse.

    If things had stayed as they were before last patch I'd agree with your time estimate, next expansion is a good deadline - folks would have likely not bought it with that lag but some would have and most would have kept their sub this long.
    With things as they are now, I'm honestly not sure I want to keep struggling just one more week. I could live with "horrible lag" a couple of times a week; I can't live with "horribleR lag and not being able to move a raid in less than 30 minutes" every two days (we raid every two days, and I don't log in anymore out of that).
  18. eliandra Elder

    - 30 sec to 2 mins to receive a tell (not just during raid , all the time)
    - guild hall /house delete .... 1 week to recover , 1 day to vanish again
    -fellowship disbanded (and when i creat one again , 6 h after it s empty again )
    - 5 mins to zone when using banner during raid )
    - when using instant item for tp , i have time to smoke and back to see my character zoning .... if lucky i can smoke an other to see my character arrive ... if not i need to relog.
    -server crash/instable/difficulty to log in each day now

    so my question is : any lawyer here to begin a real action ? we pay for a service , and we have only answer like : we will fix it until now and 2099 .... or you can move on vox for free ( why only vox ? ) with near 0 players on the same timeline than AB .
  19. Asmadeus Journeyman

    Fun fact, I can log in a toon on another server and have that toon talk in channels/send tells while we're stuck (there are whole minutes of stuckednes all the time so it's easy to reproduce)

    I think I'll just lead raids from the other server, at least I can talk while everyone else is stuck unable to communicate...
  20. eliandra Elder

    i forgoten progression time bugged (step not valide ) due to lag or other bug in this server