Addressing Latency

Discussion in 'News and Announcements' started by Accendo, Apr 20, 2022.

  1. Accendo Guest

    For those of you upset and frustrated by lag, we understand and hear you. It frustrates us, too. This post is meant to provide some insight into what causes these issues, how we got to where we are today, and what is being done to improve performance now and in the future.

    Game performance is something we work on continuously and is very challenging to solve completely. EQ has a lot of complex systems, and at least 25 years of code. More than 50 different programmers have come and gone over the years. One of the many challenges of working on EQ is understanding what someone was trying to do when they wrote the code we now are modifying, and doing so correctly without breaking existing things. It is common that we do break things (despite thorough testing), and we fix them as soon as we can.

    Unfortunately, in the past, EQ hasn't always reliably measured what impact changes to the game have made to performance over time. As more and more things were added to the game, performance declined. The game's current state is that we need to fix numerous issues that can be quite difficult to fix (a minor improvement may take one programmer a week to complete), and see only a small improvement which is imperceptible to players.

    Over the past few years, we have made a lot of changes to alleviate performance issues, but it is still a significant problem, and is well known and cared about by the developers. However, it is understandable that these changes feel as though nothing has been done. It is as frustrating for you as it is for us to have poor server performance. We want nothing more than to allow more players to be on at the same time, for zones to run smoothly in raids, and for playing the game to be a good experience. This year, the top issue for the engineering team to tackle is raid zone performance. It is important to us to improve things, and we know very well the impact it has on your enjoyment of and ability to play the game.

    Some of the hits to performance in EQ in the past were to help with other problems, like lack of memory. Now that EQ is 64-bit, this is no longer an issue. It is possible these memory improvement changes can be reversed in favor of game performance, but much like that work was an enormous task to accomplish and polish such that the game worked, reversing these would be a significant effort, again, with no visible change (i.e. a massive amount of work and changes to result in no visible change to players which is sometimes the whole point). For these reasons, a lot of the time it looks as though nothing is being done. For every live patch there are numerous internal patch notes not visible to players, often a much longer list than the live patch notes.

    There has been some speculation on the forums that EQ's performance issues are caused by poor quality hardware. In the past few years, EQ has had a complete turnover in the hardware running our world and zone servers. In the next month, we are planning to add additional hosts and memory to run zones. These kinds of changes are made all the time and are mostly invisible to players, but do help improve the game and how it performs. Running good hardware is something EQ invests highly in and improves continuously in small increments. There are teams who assist EQ behind the scenes (and Daybreak as a whole) with this kind of work. A lot of people contribute positively to the uptime, performance, and success of this game.

    There are two major types of lag in EQ. World/server lag (i.e. Antonius Bayle or Zek), and zone lag (raid performance issues). Symptoms of world lag are things like chat lag, slow zone times, and overall performance. World lag also affects how quickly zones respond to things that rely on the world—if world lag is bad, zones will also begin to fail. Symptoms of zone lag include delays on casting spells and the movement of NPCs (like rubberbanding). Zone lag is the main issue regarding raid zone performance, but it is partially impacted by the world's performance as well.

    So what causes world and zone lag? One of the issues that affects both the zone and world is a large quantity of packets. The zone and world will only process packets for a maximum time of 500ms before moving on. This causes a cascade of issues where players see there was no response to their action, and so they do it again, or they spam it. The world/zone gets more and more requests and can't handle them. A good example of this is when PoP releases on a TLP server. At the launch in Plane of Knowledge, each request to enter and leave the zone is fairly taxing and it takes the zone an astounding amount of time to catch up. It is barely able to transfer players to the next zone due to the amount of requests it is getting. This bottleneck affects other zone functions like the library door not responding, and players are unable to do almost anything. If the world gets too many requests, it will fail to update important things. If the world does not touch base with each running zone every 5 minutes, they are considered disconnected. When this happens, we see issues like zones going down, players getting disconnected, players logging in again which is taxing on the world, causing more zones to go down and aggravating the issue to the point where the world itself may go down. This occurs on new TLP servers if there is no limit on the players who are allowed to log in. It can also cause issues on live servers when many players log in at the same time after a patch.

    World lag is also impacted by the number of players online and the number of zones running. The more players online and the more zones, the worse performance gets. This becomes worse as the game advances through expansions. Early TLP servers can support more players than servers with all expansions released. As more systems are active, all that data is being handled, and it takes processing time. One of the major culprits of world performance is real estate - each time your character zones, all the items you own in any real estate need to be reloaded, sent to the world, and sent to you on the destination zone. The lookup of this data has a minimum response time of 60ms. When a lot of players are on a server, that time adds up.

    Issues unique to zone lag are primarily caused by spells, combat, and the movement of NPCs. The large quantity of spells that go off in the modern raiding game cause the zone to hit the limit of 500ms of processing time repeatedly. This is when you start to see performance issues. Each zone loop, if the zone doesn't process fast enough, we don't process NPC movement in time, and you see things like rubberbanding and failure for event mechanics to occur in a reasonable time (such as the auras in Oubliette of Light and Aten Ha Ra). When you think about the number of pets (specifically swarm pets) that can be created in a raid, think also about the amount of time the zone needs to spend processing their movement. When the zone loop takes so long to occur, NPCs can barely move (even if you exclude pets), because the zone is busy processing hundreds of spells in each loop. When you think about the numerous SPAs and complexity of things they can do in EQ, that is part of why spells are so taxing; and also why making improvements to how the spell system performs is so difficult.

    This issue is compounded by melee procs, spells that land on many targets (Splash/Squall), spells that trigger other spells, sympathetic procs, and Twincast. Along with spells, melee has an impact - think about the number of swings you make compared to the early game. As more and more abilities were added to the game that allowed players to get additional hits, this added another round that the zone needs to process each hit. Melee combat code is not as complex as spell code, but it is not great. Think about each hit checking for hit/miss, riposte, parry, block, dodge, strikethrough—calculating the impact of heroic stats on each one. One of the especially negative impacts on melee performance has been the SPA that returns HP on swings. Each hit is now going through spell code as well as melee code. Compared to spells, an enormous amount of melee hits occur in the same timespan. By themselves, no spell or ability seems outlandishly bad for performance. But put them all together in a raid with 54 players, and the zone is brought to its knees.

    One question players have about raid zone performance is why it varies in how bad it is. Each time you request an instance, it can be started on any one set of numerous zone hosts. These are very powerful machines that have load balancing to try to prevent too many raid instances and high load zones from running on the same host. Despite this, if one host is running several raid zones that are very taxing on the CPU, it will negatively affect other zones running on the same host. If you have a raid instance that performed strangely well, the zone host it was running on was not running as many other raid instances (if any). This is part of why raid lag is much worse on Sundays, as that is the most popular raid time in the game, and gives you the highest probability of having your instance run on a taxed zone host. Changes being made in this month’s update will also help zone performance by fixing a bug that allowed instances to stay active longer than they should've. This caused zone performance to degrade the longer a server was up.

    What is being done about game performance? Last year, we completed a project to implement a telemetry system for accurately measuring performance. It allows us to see detailed information about what and where in EQ’s code things are running poorly. We periodically capture performance metrics from worlds, zones (especially raid zones under a lot of strain), and the game client. This year we are focusing on improvements specific to raid zone performance, as that is the most significant issue in the modern game. These performance captures allow us to identify performance bottlenecks, so we can focus our efforts on things that will make the most positive impact. Aside from that, one of the most common types of performance issues are things that “block” a process from doing anything else.

    For example, when you log into a world, your character may not be cached and the world will do what is called a “blocking load” on the character database. While this load is being processed, the world is not able to do anything else. These type of blocking load issues are fixed (in a rough sense) by making them asynchronous. We create a request list for data from another process and allow things to continue being processed until the data we requested arrives. When it arrives, the process continues to where it needs to go as the original code did previously. These types of changes are sometimes more difficult than they sound, and are often risky depending on the different ways that data can be requested, and handled after the fact. Some examples of changes like this that we’ve fixed are fellowship experience sharing, loading guilds when entering the game from character select, real estate definitions (aside from items), and applying account-wide lockouts. EQ has a surprising amount of complexity when it comes to how different game mechanics perform.

    We hope these details have provided some insight into the intricacy of EQ's performance. Know that the development team does care and wants to help improve how the game runs. We make improvements as we are able, and will continue to do so in the future. It is very challenging, but also a very important investment into the health and future of the game.

    —EverQuest Team
  2. Nudia Augur

  3. MasterMagnus The Oracle of AllHigh

    The communication is appreciated. Much of this was already known, and the pain will continue.

    I have maybe 2 coppers left in my coin purse.

    1. Extreme volume of packets/requests: Remove Multibinds. Seriously, I know people will hate this idea, but seriously. Possibly also throttle spamming.

    2. Swarm pet bandwidth consumed: Remove the 3d models for swarm pets, the spells cast a graphic fx like a nimbus over the mob, and it can be turned off in spell effects to help performance even more.
    Wdor and Skuz like this.
  4. Paladin Augur

    I think it should also be said that not all such issues are DP's fault, either.

    I.E. Your ISP / ISP Speed; how many are using your Wi-Fi; the quality of your computer (Memory, OS, Graphic Card, number of programs you have running...); DNS attacks on Internet HUB's / sites; and so forth.

    What might also help players is to discuss things that can help them reduce lag on their own.

    Example, /log off... Alt J and turn off your Journal... in-game graphics settings....

    TBH, I don't have too much issue with lag... nor more than the very brief nano-second lag spikes.
    Wdor and CatsPaws like this.
  5. MyShadower All-natural Intelligence

    You hear that hoarders? WebMD - When to Help a Hoarder

    Take only what you need to survive!
    minimind and Nennius like this.
  6. Jumbur Improved Familiar

    I thought the whole point of Dragon's Hoard, was that you could store character-data separately that didn't need to follow the character(because it wasn't meant to be "cached"). Why not put the whole bank-content and real-estate into a Dragon's Hoard-like Storage?
    I would gladly accept limitations to the "find-function" and load-times at the bank, if It meant the game would run more reliably.

    While pathing for pets is important, the same can't be said for familiars. Let the graphic representation of familiars be handled client-side. I don't care if familiar's positioning is inconsistent across different clients...
    Vumad, Warpeace, Wdor and 2 others like this.
  7. Zanarnar Augur

    hopefully, with the memory issue sorted out, they can fix some things that are calculated far too frequently.

    For example (at least last time I asked someone at dpg) the offensive and defensive values are recalculated every combat round, rather then when a trigger happens and stored in memory.. simply because there wasn't memory to spare.

    I'm guessing optimizing the combat loop (if they haven't already) would be a huge improvement as so many things are attacking so fast in modern eq (so many swarm pets).

    Glad to hear its a priority, and good luck tracking the major offenders down
  8. Rogean Lorekeeper

    I'm surprised the Worldserver still has blocking loads. I spent several months back in 2014 removing all blocking loads from the P99 code; Any and every database call is done in it's own thread asynchronous to the Application thread. It was the single greatest performance increase we've ever done. You guys will see similar results I'm sure.
    Ileasa, Zaray, Wulfhere and 2 others like this.
  9. Zamazx New Member

    What if you replace swarm pets, for a second pet that does the same damage to lower number of objects to draw, or even just a new pet buff that your pet hits hard for equal dps.

    I would even consider slowing combat down as long as the fight time stays the same to reduce latency. Example say a weapon has a ratio of 10, it is 200 damage over 20 delay, if you retooled combat to make it 400 damage over 40 delay, or maybe some other adjustments might need to be in place but if the same dps is done but in 50% less attacks.

    Right now if I have my combat on it scrolls so fast I can barely see what is going on. I would almost prefer less combat spawn and easier to see combat data. Would make the game more enjoyable plus who doesn't like seeing big numbers in our damage lol.
  10. Cideral Lorekeeper

    what about the speculation that FV and Beta run together?
  11. Jhenna_BB Proudly Prestigious Pointed Purveyor of Pincusions

    This is actually pretty easy to fix. Now that system upgrades are in place, and the number of AA's can increase (if not now in the future) it's better for the game if tribute buffs just go to the AA window. Those are a big reason why real estate would need to zone in the first place. Do away with that search the game needs to run and there's no real reason to have it load each time you zone. If people get their tribute buffs on all the time, with no tribute so be it really if the game runs better.
    Vumad, Wdor, HotDiggity and 2 others like this.
  12. Rylak Elder

    First off, the community as a whole really respects this and these types of posts. Well done.

    We know you all care about this topic and are working hard to address it. i think the thing that frustrates us as players is what is happening in game seems to be in direct contradiction at times to known and communicated things that will fix the problem. Take this section from Accendo:

    We all know there are just too many calculations going on per second for raid zones not to lag. Some of the things that contribute to some of the issues are detailed well in the quotes. But in response to this, what do we get for modern/current Anniversary rewards? - More proc augs....

    Please - Hard Stop. There are easy ways to recuperate the minor DPS loss or HP gain these "gimick" effects generate. Base DMG increase into aug/weapon/spell. Boosted heal values so heal frequency rates don't change with the loss of the HP return effects. Further consolidation of AAs that fire multiple procs and/or elimination of said AAs and instead root the benefit back into the base spell calculation and/or action. We as players like all these neat little "toys" and customizations; but I guarantee you we will be happy to give them up for improved performance at their loss, and more than likely further improvements in DPS if nothing is compensated just in the fact that base attacks, abilities, pets, etc. all operate as intended.

    Anyways, keep up the great work and efforts and thanks again for the improved communication streams.
    Vumad, FawnTemplar, Warpeace and 8 others like this.
  13. Yinla Ye Ol' Dragon

    Wouldn't it help to remove the trash from the zone and add the luck augs to the chest or give us mini events to kill? Or even add the luck augs to the vendor.
    Zaray, Wdor and Skuz like this.
  14. yepmetoo Abazzagorath

    Jokes about hamsters on wheels doesn't mean people think its your hardware. Why would it be your hardware? Its the service you pay for to connect those servers to us.

    Zone times are irrelevant to the problem. The problem is unplayable lag after you are already in the zone.
  15. ChiiChii Augur

    The current framework of the game doesn't have a replacement for multibinds nor could it support such an overhaul. It would only bandaid the known problem. Each action taken produces text. Focus effects on armor pieces are problem the most wasteful spam text completely unneeded. The game was written almost as it were a giant flat file parsing data on each action. Multiply that and it explains everything Accendo mentioned.

    The best way forward is fix what CAN be fixed and compromise. Focus all new development on UI re-design, graphics engine, remaking all content into the new platform, removing what are considered 'nerfs' due to the high I/O load because of procs. For in-game, boom. Solid.
    As for network, not my wheel house.
  16. Sancus Augur

    I appreciate this post.

    That said, I have to echo Rylak's sentiments. This is not just an engineering problem, and it's frustrating to see low hanging fruit on the design side (like reducing the number of procs/swarm pets) go largely ignored (outside of one minor change) for years. Having systems designers spend time on Overseer or classic achievements while the core systems they're also responsible for (AAs/spells/items) languish and contribute to lag is really disheartening from a player perspective.
    Vumad, Tucoh, Allayna and 19 others like this.
  17. Leex Pewpewer

    I think the transparency is great, it's nice to know what's going on and the fact that "something" is being done to potentially assist with the issues that have plagued the game for years.

    However, that is the situation we're in. The team has had YEARS to address these issues and have not. Prioritizing other items over the playability of the game. Nerfing abilities, items, etc.

    I just want to go over a few factual items that I think need to be addressed when we talk about operations. Per EG7, Everquest has made $1B in lifetime bookings. Last year Everquest ( Not Daybreak as a whole ) made $11M+ in 2021 YTD. ( )

    Yet, your game cannot function up to expectations? We the players are not asking you to create HUGE expansions, with new game defining abilities/items( New epics would be cool ). We get 5-7 zones and a few raids a year, with abilities that have been in the game since probably before a lot of the new programmers even were apart of the team.

    Your raids should not have the level of latency that they do. You went into detail as to why, and that you're adding new hardware, which may potentially resolve some of the issues we're seeing. Why was this not implemented sooner( Years ago )? This has been an issue that's been ongoing for years.

    Look, I get it. It's easier to work on easier ticket items and show that you've done xyz this month, to show to leadership that you're doing something, to justify your position. It's a PITA to work on items that are hard, time consuming and don't make it seem like you're doing a lot for the company.

    I don't think I even need to say this, but based off of our history, I will.

    Everquest is ever evolving, new features will be added, which will put more strain on the hardware to keep up with each action. Investing in hardware that is superior now, will mean less upkeep later down the road. That means, better TLP launches, more room for the devs/programmers to potentially add cool new features, and less raid latency.

    Get to a point where you as a team are not working from behind, and are able to be proactive.

    I'm not expecting any responses, this is just a post venting.
  18. Benito EQ player since 2001.

    This is good information.

    It sounds like a new major tool to gather empirical data on performance hits.
    minimind likes this.
  19. Jumbur Improved Familiar

    We don't need 5+ different effects to go off, each time we cast a spell. I would prefer if a spell-cast only resulted in a single line of damage-text in my combat-spam.
    I miss the old days where I could see how much damage a spell-cast did, without starting a thirdparty text-parser...:(
    And it seems like the server would benefit from some simplifying too. Change all procs from items into the "spell-dmg" mod2 stat instead. and add an AA for each class, that gives a modifier to that effect, to adjust for the change.

    Or consolidate all the worn procs into a single big proc.

    It would mean a mob can't resist some of the spell-procs and take damage from some of the other spell-procs, instead, it would be all or nothing. But it would probably reduce packet-spam on raids a lot.
  20. Szilent Augur

    for how many players, in how many zones, across how many servers did your solution end up with no problems? if the answer isn't many simultaneous thousands of users, in hundreds of active zones (is Live EQ past 1k now?), with dozens of servers, then… the development environment of Live EQ programmers may have more complexity than yours along with greater consequences for snafus since their clientele are paying customers.
    Goodhammer, Biltene and Wdor like this.