Server health service

by **MagicManICT** » Wed Mar 06, 2019 5:40 pm

dullah wrote:There could be a daily, or weekly maintenance, alot of online games do that.
During that maintenance, restart server, backup and do all the tidy stuff needed.

haven isn't anywhere near the size of game those are. Those are very complex on the back end compared to Haven. What takes WoW hours can be done for Haven in the background. A lot of games have gotten away from this daily/weekly scheduled downtime.

Loftar already has a system in place for all of this. You can't do server without it... at least not if you want stability (>cough< world 5).

by **Nolokor** » Wed Mar 06, 2019 8:02 pm

Some nice points being made in this topic, however.
If server crashed because of rare bug (like previous crash with craft-all on firebrands) then auto-restart will reduce downtime.
If server crashed because of deliberate actions to crash server then auto-restart will not help because people will still be able to crash server intentionally.
And while those 2 cases are trivial, all other valid cases in this thread requires knowledge of how server code is done in case of HnH. So I assumed best case scenario, when restarting server could not lead to data corruption by design. If this is not the case, then yes, I retreat my suggestion, as auto-restart in this case could lead to more cons than pros.

But if no transaction could lead to corrupted world state, then such service would be better for both developers and players. Players will have less downtime for server and developers could have more time to fix server crash if it's 'rare bug' situation, as it reduces urgency of such crashes.
In production servers all those crashes usually send e-mail with attached minidump, stack trace and log, and in case of persistent world, like HnH autorestart could stop after several unsuccessful attempts in rapid succession.
It really doesn't require a lot of development time as this is common situation for many services (not only online games) and it has common solutions.
To get coredump of running process you just use 'gcore <pid>'.
Linux could be configured to create full coredumps on executable crash.
Then you run gdb with -c <core_file> and --eval-command="thread apply all bt" to get stack trace (for email alert) and you can debug this coredump later.

by **Granger** » Wed Mar 06, 2019 8:41 pm

As fas as I have gathered over the years:

The data structure that is the H&H world is partly in RAM and partly on disk.
You can't simply discard the part in RAM without the structure on-disk being corrupted.

Server health service

Re: Server health service

Re: Server health service

Re: Server health service

Who is online