Granger wrote:As a stab into the dark: do you have a powersave CPU gouvernor running, or any tool that adjusts CPU clock speed?
If so it could be a try to disable these and manually set the CPU to maximum clock.
Background is that I encountered a system once that, whenever the clock speed changed, stalled IOs.
It used the powersave governor by default, but I've already set that to performance, since apparently on Skylake the powersave governor caused the CPU to turbo much less. (On the previous server, with a Haswell Xeon, it turboed almost constantly regardless of cpufreq governor). That being said, even the performance governor doesn't keep the CPU at full frequency at all times, but if it would be as you say that frequency changes were stalling I/O, then I find that explanation unlikely because the frequency changes literally constantly (I can check the CPU frequency as frequently as I want to, and it's still basically changing on every check), so if that were the case it seems I would get no I/O done ever.
Granger wrote:Also, in case you have mounted the filesystems with TRIM/DISCARD support you could try to turn that off (and replace it with a daily/weekly fstrim by cron, in case the online trim causes the stalls).
I have in fact mounted them without TRIM/DISCARD, because to my understanding, using TRIM doesn't make a lot of sense on bcache (as the cache device is constantly kept full anyway).
EDIT: If anything, that makes me start to wonder if that could actually be the problem, since the FTL might have to spend time garbage collecting. Perhaps it would have worked better if I had reserved 100 GB or so of permanently unused space on the drives. Just testing whether that's the case would take a fair bit of downtime in order to recreate the RAID device and all, though. Hmm. Does anyone have enough experience with NVMe drives to say how likely that is to be the case (and, if so, how much space should be reserved)? My own knowledge of them is mostly theoretical.