Lagspike saga

by **loftar** » Sat Mar 02, 2019 6:16 pm

Granger wrote:Well, as long as the HDD don't report similar...

No, they're quite normal, at around 35 °C.

Granger wrote:Did it read it right that the issue has moved from the one NVMe (where you noticed it) to the other and is still there at the moment?

Indeed.

by **Granger** » Sat Mar 02, 2019 6:19 pm

Have you looked at them with the nvme tool available from the nvme-cli package? Could you post the output for both?
Interesting stuff could be in 'Thermal Throttle Status' .

Also, according to https://www.percona.com/blog/2017/02/09 ... sh-health/

Warning Temperature Time/Critical Temperature Time. The time in minutes a device operated above a warning or critical temperature. It should be zeroes.

by **loftar** » Sat Mar 02, 2019 6:22 pm

Granger wrote:Have you looked at them with the nvme tool available from the nvme-cli package? Could you post the output for both?

I have, but I haven't found anything that obviously stands out. Is there any particular subcommand you have in mind? The error log is empty, if that's what you have in mind.

by **Granger** » Sat Mar 02, 2019 6:23 pm

Basically NVMe SMART attributes (smartctl -A, for smartmontools>=6.5).

by **loftar** » Sat Mar 02, 2019 6:26 pm

Granger wrote:Basically NVMe SMART attributes (smartctl -A, for smartmontools>=6.5).

I don't see any real difference at all between startctl -a and nvme smart-log for these drives; they contain the exact same values. The drives don't seem to support nvme smart-log-add.

by **shubla** » Sat Mar 02, 2019 8:03 pm

loftar wrote:

Granger wrote:I would also check if the temperature reading in the smart output is correct, 57°C seems a bit high to me.

I thought so too, and asked Hetzner about it. They said it's normal. >.>

Should've put your servers in Finland, its a lot cooler up here!

by **Granger** » Mon Mar 04, 2019 12:43 am

As a stab into the dark: do you have a powersave CPU gouvernor running, or any tool that adjusts CPU clock speed?
If so it could be a try to disable these and manually set the CPU to maximum clock.
Background is that I encountered a system once that, whenever the clock speed changed, stalled IOs.

Also, in case you have mounted the filesystems with TRIM/DISCARD support you could try to turn that off (and replace it with a daily/weekly fstrim by cron, in case the online trim causes the stalls).

In the end it would be beneficial to locate the source of the IO floods that, in case I understood you correctly, started to happen with world 11.

by **Grog** » Mon Mar 04, 2019 1:30 am

Cave decay was implemented in w11, right?

by **loftar** » Mon Mar 04, 2019 1:42 am

Granger wrote:As a stab into the dark: do you have a powersave CPU gouvernor running, or any tool that adjusts CPU clock speed?
If so it could be a try to disable these and manually set the CPU to maximum clock.
Background is that I encountered a system once that, whenever the clock speed changed, stalled IOs.

It used the powersave governor by default, but I've already set that to performance, since apparently on Skylake the powersave governor caused the CPU to turbo much less. (On the previous server, with a Haswell Xeon, it turboed almost constantly regardless of cpufreq governor). That being said, even the performance governor doesn't keep the CPU at full frequency at all times, but if it would be as you say that frequency changes were stalling I/O, then I find that explanation unlikely because the frequency changes literally constantly (I can check the CPU frequency as frequently as I want to, and it's still basically changing on every check), so if that were the case it seems I would get no I/O done ever.

Granger wrote:Also, in case you have mounted the filesystems with TRIM/DISCARD support you could try to turn that off (and replace it with a daily/weekly fstrim by cron, in case the online trim causes the stalls).

I have in fact mounted them without TRIM/DISCARD, because to my understanding, using TRIM doesn't make a lot of sense on bcache (as the cache device is constantly kept full anyway).

EDIT: If anything, that makes me start to wonder if that could actually be the problem, since the FTL might have to spend time garbage collecting. Perhaps it would have worked better if I had reserved 100 GB or so of permanently unused space on the drives. Just testing whether that's the case would take a fair bit of downtime in order to recreate the RAID device and all, though. Hmm. Does anyone have enough experience with NVMe drives to say how likely that is to be the case (and, if so, how much space should be reserved)? My own knowledge of them is mostly theoretical.

by **Granger** » Mon Mar 04, 2019 7:46 am

flash_vol_create in /sys/fs/bcache/<cset-uuid> could, according to bcache documentation, be a way to emulate overprovision ex post facto.

Thinking a bit further about online TRIM on the backed filesystem: could also be an answer, as it would tell bcache which blocks got free'd. Without it bcache might see the cache running full and doing a purge of multiple erase blocks all over the cache, the latencies of a bunch of small discards in parallel might add up to the stalls you see... Depends on the discard setting in /sys/block/<cdev>/bcache though.

Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Re: Lagspike saga

Who is online