Scheduling Server Downtime - why not middle of night?

Discussion in 'The Veterans' Lounge' started by Deux, Apr 4, 2023.

  1. Metanis Bad Company

    IT work should be scheduled to have the least impact on customers and staff. However, for a "smaller" gaming outfit like EG7/DBG I'm sure the staff concerns will outweigh the customers by 10000 to 1.

    If I were leading the show, I'd make sure that "routine" work like doing backups, server swaps, hardware upgrades, and disk maintenance were all performed in the early morning hours while utilization was at a minimum.

    Then, loading updates and new patches should be scheduled to be completed just before primary staff members arrive at their desks. If there are no problems with the updates then everyone just works like normal, but if there are problems with the updates/patches then you have a full complement of staff available to respond and resolve.

    In other words, they should be starting these things around 2am Pacific instead of 6am.
    Claan and Annastasya like this.
  2. Nennius Curmudgeon

    For the record, we have had patches start in the middle of the night before. Speaking from a PDT perspective, of course. I suspect we will again sometime.
  3. Svann2 The Magnificent


    I find it odd that people complain that 6am is prime time. Sure some parts of the world, but not the majority by any means. And midnight would still be some players prime time.
    Nennius, Rijacki and CatsPaws like this.
  4. Angahran Augur

    Who's night ? You do realise people play this game all around the world.
    Rijacki likes this.
  5. Tarvas Redwall of Coirnav, now Drinal

    "Who" is doing the "routine work" at 2 AM PDT? Does DPG even own the assets this "routine work" has to happen on or is it subcontracted? There are too many variables we do not know to make an accurate assessment of what they "should" be doing.
    Rijacki likes this.
  6. Cloud the Third Augur

    Don't forget that a dev has to be awake to do the work too. I have had to publish stuff more than once at like 2-4 am my time and I have made more mistake doing it at those hours than I do when we have it during hours when I am more awake. So in the long run yes the devs or whoever is doing the publish can be asked to get online at any time but just because you think it may be out of your time doesn't mean if they make mistake from being tired it doesn't' cause you more downtime in the long run from bugs.

    I can think of one publish where this happened and we screwed up the server when doing a publish that took our app offline for over 24 hours. I think that was the last time we did it at really early am hours our time because it should have only been down 6 hours but ended up taking over 24 hours.
    Rijacki likes this.
  7. Rijacki Just a rare RPer on FV and Oakwynd

    I've been working in development or system/network admin since the late 80s (currently, I do software testing on industrial routers). Even if everything goes 100% perfect with no human flub ups in the maintenance or publish of a significant change and several contingencies were planned for potential problems -and- the upgrade or change in the maintenance was thoroughly tested on a portion of the whole network to "certify" it would work... everything can come to a crashing halt and fail because of something no one could anticipate. The more complex and expanded the system is, the more likely that will happen. This is one reason why it takes corporations or other large organisations to take weeks, months, or even years to upgrade systems even if they have the budget for the equipment/software. Prior to working for my current employer, I worked on the network/software side of electronic payment systems (and was working there crossing Y2K). In both jobs, I have seen a lot of failures even when everything was done absolutely correctly. And then development (which includes testers) scrambles like mad. This is also why it is very important there is a rollback method designed into an upgrade, too, unless it's absolutely impossible to have it.
    Corwyhn Lionheart likes this.
  8. Cloud the Third Augur

    I get annoyed when people want to follow whatever the current standard is but for whatever reason they have no clue why the standard is to code a certain way. I have seen some of our people code like 5 layers of services making it a nightmare to figure out what is really going on and half the layers did nothing at all but call the next layer. controller calls service that calls next service that calls the next service that calls the database layer that calls the database. Zero lines of code other than calling the next layer except in the database layer. The best part of all of this was the unit testing for this api call and it mocked the database for the unit testing meaning the unit test really did nothing but they could check it off as having covered all those layers that didn't even validate input.
    Rijacki likes this.
  9. Rijacki Just a rare RPer on FV and Oakwynd

    Every so often we will file a bug and the explanation from the developer on what they did to break it...err.. what their design choices were make me roll my eyes hard and then go test based on their explanation to be able to point out what else broke because of it. I am very good at finding the ways things break :) (usually in interactions with other features the developer didn't take into account for whatever reason)
    Corwyhn Lionheart likes this.
  10. Cloud the Third Augur

    My fav one was some other devs were working on this micro ui site and all of our client website portals (we have multible) were suppose to pass in parameters to this javascript but also means the parameters are unencrypted data sitting on the page that you can look at the source and see. Now this may not seem like a big deal until you realize I work for a background screening company and we deal with alot of PII data that people aren't suppose to be able to see. I pointed it out to them during one of our meetings and it was a fun discussion about how people who don't have permission to see an SSN or DOB for someone can look at the code behind on the website and see it. /sigh
    Rijacki likes this.
  11. Rijacki Just a rare RPer on FV and Oakwynd

    This is a mistake made on a lot of sites and leads to data breeches that developers or testers can see coming from a mile away but management thinks is no big deal until the breech occurs. There has only been a more pronounced move to "securing" stuff through refactoring in the last few years across the industry, even though those in the trenches were begging for it. Sadly, in many cases, no refactoring or proper securing is allowed before a breech happens or a security audit points it out because the cost of doing it 'right' is a lot more than doing it fast. I can vague-book all day on a lot of the stuff I've seen over the last couple decades and more. Security through obscurity... :p