jorb wrote:Stop shitposting.
Vassteel wrote:Not a IT guy but ive heard turning it off and turning it back on works
loftar wrote:Rebooting in two minutes.
Granger wrote:I have some ideas, to avoid too much red herrings I kindly ask loftar to post the exact kernel version, what schedulers are active for the drives and the output of smartctl -a for the drives.
$ sudo smartctl -a /dev/nvme0
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.2-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: THNSN5512GPU7 TOSHIBA
Serial Number: 273S10WSTUHV
Firmware Version: 57GA4103
PCI Vendor/Subsystem ID: 0x1179
IEEE OUI Identifier: 0x00080d
Controller ID: 0
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Sat Mar 2 17:06:13 2019 CET
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x000e): Wr_Unc DS_Mngmt Wr_Zero
Warning Comp. Temp. Threshold: 78 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.00W - - 0 0 0 0 0 0
1 + 2.40W - - 1 1 1 1 0 0
2 + 1.90W - - 2 2 2 2 0 0
3 - 0.1600W - - 3 3 3 3 1000 1000
4 - 0.0120W - - 4 4 4 4 5000 35000
5 - 0.0060W - - 5 5 5 5 100000 110000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 57 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 13%
Data Units Read: 130,882,588 [67.0 TB]
Data Units Written: 71,648,646 [36.6 TB]
Host Read Commands: 5,958,767,568
Host Write Commands: 687,281,425
Controller Busy Time: 20,611
Power Cycles: 23
Power On Hours: 13,592
Unsafe Shutdowns: 6
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 11
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 57 Celsius
Error Information (NVMe Log 0x01, max 128 entries)
No Errors Logged
$ sudo smartctl -a /dev/nvme1
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.2-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: THNSN5512GPU7 TOSHIBA
Serial Number: 273S10WDTUHV
Firmware Version: 57GA4103
PCI Vendor/Subsystem ID: 0x1179
IEEE OUI Identifier: 0x00080d
Controller ID: 0
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Sat Mar 2 17:06:15 2019 CET
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x000e): Wr_Unc DS_Mngmt Wr_Zero
Warning Comp. Temp. Threshold: 78 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.00W - - 0 0 0 0 0 0
1 + 2.40W - - 1 1 1 1 0 0
2 + 1.90W - - 2 2 2 2 0 0
3 - 0.1600W - - 3 3 3 3 1000 1000
4 - 0.0120W - - 4 4 4 4 5000 35000
5 - 0.0060W - - 5 5 5 5 100000 110000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 51 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 17%
Data Units Read: 120,388,671 [61.6 TB]
Data Units Written: 72,558,074 [37.1 TB]
Host Read Commands: 5,928,728,026
Host Write Commands: 662,031,292
Controller Busy Time: 20,684
Power Cycles: 22
Power On Hours: 13,581
Unsafe Shutdowns: 9
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 37
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 51 Celsius
Error Information (NVMe Log 0x01, max 128 entries)
No Errors Logged
Granger wrote:I would start with setting the noop scheduler for the hdd for a quick and on the fly test, just to make sure that the problem isn't somewhere in that level of the IO stack and the stall you see is caused by bcache waiting for a hdd.
Granger wrote:I would also check if the temperature reading in the smart output is correct, 57°C seems a bit high to me.
loftar wrote:Granger wrote:I would also check if the temperature reading in the smart output is correct, 57°C seems a bit high to me.
I thought so too, and asked Hetzner about it. They said it's normal. >.>