by Nolokor » Wed Mar 06, 2019 8:02 pm
Some nice points being made in this topic, however.
If server crashed because of rare bug (like previous crash with craft-all on firebrands) then auto-restart will reduce downtime.
If server crashed because of deliberate actions to crash server then auto-restart will not help because people will still be able to crash server intentionally.
And while those 2 cases are trivial, all other valid cases in this thread requires knowledge of how server code is done in case of HnH. So I assumed best case scenario, when restarting server could not lead to data corruption by design. If this is not the case, then yes, I retreat my suggestion, as auto-restart in this case could lead to more cons than pros.
But if no transaction could lead to corrupted world state, then such service would be better for both developers and players. Players will have less downtime for server and developers could have more time to fix server crash if it's 'rare bug' situation, as it reduces urgency of such crashes.
In production servers all those crashes usually send e-mail with attached minidump, stack trace and log, and in case of persistent world, like HnH autorestart could stop after several unsuccessful attempts in rapid succession.
It really doesn't require a lot of development time as this is common situation for many services (not only online games) and it has common solutions.
To get coredump of running process you just use 'gcore <pid>'.
Linux could be configured to create full coredumps on executable crash.
Then you run gdb with -c <core_file> and --eval-command="thread apply all bt" to get stack trace (for email alert) and you can debug this coredump later.