nvme/050: add support for NVMe multipath devices#214
nvme/050: add support for NVMe multipath devices#214yizhanglinux wants to merge 2 commits intolinux-blktests:masterfrom
Conversation
|
There are enterprise PCI disks with multipath. I think it would be good to tests these devices as well. It seems you are testing with such a device, maybe we could get this tests also working with these types? I think it it's possible to make |
The test case nvme/032 sets the value "pci" to the global variable nvme_trtype to ensure that the test case runs only when TEST_DEV is a NVME device using PCI transport. However, this approach was not working as intended since the global variable is not referred to. The test case was run for NVME devices using non-PCI transport, and reported false- positive failures. Commit c634b8a ("nvme/032: skip on non-PCI devices") introduced the helper function _require_test_dev_is_nvme_pci(). This function ensures that the test case nvme/032 is skipped when TEST_DEV is not a NVME device with PCI transport. Despite this improvement, the unused global variable nvme_trtype continued to be set. Remove the unnecessary substitution code. In the same manner, the test case nvme/050 is expected to be run only when TEST_DEV is a NVME device with PCI transport. It also sets the global variable nvme_trtype, but it caused unexpected failure as reported in the Link. Modify the test case to use _require_test_dev_is_nvme_pci() to ensure the requirement. Fixes: c634b8a ("nvme/032: skip on non-PCI devices") Link: linux-blktests#214 Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
|
IIUC what @igaw suggests,
Assuming this guess it correct, I created a patch. It just checks that TEST_DEV has PCI transport. @yizhanglinux , does this patch avoid the failure you face? |
|
@kawasaki This can skip the test when the disk is an enterprise PCI disk with multipath. Maybe something like below to return the PCI device when the disk supports multipath: |
|
@yizhanglinux Thanks for the clarification. Now I have better understanding. @igaw Please comment if the suggested change suits your comment. As to the change by @yizhanglinux , I have a few comments:
|
OK, we replace all the "-f" with "--canonicalize" for all files in one patch
How about the below change: |
|
Yes, this looks like what I had in mind. Thanks for taking care! |
|
@yizhanglinux Thanks. Overall, the suggested change looks good. Posting it as a proper patch or PR will be appreciated. Again, please use "--canonicalize" instead of "-f". Also, I think it is good to have another patch to replace "-f" with "--canonicalize" for other places, which can be done in the same PR/series, or later. |
|
I just found the case still failed due to no "Input/output error" output from dmesg, and also no error log output [2]. [1] [2] |
|
First, is the line below correct now? echo 1 > "/sys/bus/pci/devices/${pdev}/remove"If so, then there is another problem. But from the kernel logs, it looks like the remove works. |
|
Actually, I wonder why the fio jobs is supposed to succeed at all. The device is removed on PCI level -> block device should also be gone. It's not a reset where the block layer should not see any device remove/add operation. |
|
Ah wait, the test is expecting that fio is failing but it doesn't. And the test doesn't fail because to the nature of the multipath device. In this configuration the head nvme device might not be removed and the block layer buffers the IOs until the driver is ready again. Though I am not totally sure how nvme-pci works here. Need to check the source. Could you check the output of fio? If it doesn't fail, then it is very likely all in flight IOs are buffered at the block layer. If so, than we have to figure out what we want to test here. |
Yes, the disk was removed and initialized again after pci rescan. |
|
The fio pass with no errors from the full log. |
|
Ah I remember what's happening with the PCI device removal and the handling of the nvme head device. When we have a mulitpath device, the life time of the nvme head device is coupled to the PCI subsystem hotplug behavior. That is As shown, fio will not fail for such configurations as the block layer will handle the requeue the failing IOs instead reporting an error to the upper layers. So this tests is written is for single path devices. This means we could disable it for multipath nvme devices or better extend it and expect no IO errors when it is an multipath device. I'd prefer the second approach. WDYT? EDIT: note, the behavior could be different on different architecture. Though I think it would be good to collect this information and document it. We could make the tests considering also the architecture if necessary. Or recent kernel have addressed this problem after all :) |
|
@igaw Thanks for the clarification.
|
|
FWIW, a nvme-pci multipath behaves similar to a fabrics device. The paths can go away and as long the ctrl loss timeout doesn't expire (or in this case pci device remove) the block device will be around. I had something like this in mind: diff --git a/tests/nvme/050 b/tests/nvme/050
index 91f356422f63..4320c00d0a81 100755
--- a/tests/nvme/050
+++ b/tests/nvme/050
@@ -19,10 +19,22 @@ requires() {
_have_kernel_options FAIL_IO_TIMEOUT FAULT_INJECTION_DEBUG_FS
}
+is_multipath_device() {
+ local nvme_ns cmic
+
+ nvme_ns="$1"
+
+ cmic="$(nvme id-ctrl "$nvme_ns" --output-format=json | jq -r '.cmic')"
+
+ if (( cmic & 0x1 )); then
+ return 0
+ fi
+
+ return 1
+}
+
test_device() {
- local nvme_ns
- local pdev
- local i
+ local nvme_ns pdev io_error i
echo "Running ${TEST_NAME}"
@@ -40,10 +52,13 @@ test_device() {
--name=reads --direct=1 --filename="${TEST_DEV}" --group_reporting \
--time_based --runtime=1m >& "$FULL"
- if grep -q "Input/output error" "$FULL"; then
- echo "Test complete"
+ io_error=false
+ grep -q "Input/output error" "$FULL" && io_error=true
+
+ if is_multipath_device "$nvme_ns"; then
+ $io_error && echo "Test complete" || echo "Test failed"
else
- echo "Test failed"
+ $io_error && echo "Test failed" || echo "Test complete"
fi
# Remove and rescan the NVME device to ensure that it has come backor if we don't want to trust that when a cmic bit is set it's multipath device then we should check what the sysfs is telling us. |
Here is the $TEST_DEV_SYSFS value for the nvme disk and nvme multipath device The nvme2n1's TEST_DEV_SYSFS cannot be used for pci get.
|
Check cmic maybe not enough, we also need to check the nvme_core.multipath=Y. Maybe it's better to just check the sysfs: |
|
looks reasonable to me. Though a comment above |
e62ef70 to
4da07df
Compare
|
@kawasaki Update the patch based on the discussion and alsp changed the title. |
Add two helper functions to tests/nvme/rc: - _nvme_dev_support_native_multipath(): Check if the test device is an NVMe native multipath device by examining the sysfs device path. - _nvme_get_pci_from_dev_sysfs(): Get the PCI address for an NVMe device, handling multipath devices by reading from the multipath subdirectory. Update nvme/050 to handle multipath devices correctly. When testing I/O timeout on a multipath device, fio will not encounter I/O errors because the multipath layer provides failover to alternate paths. Adjust the test pass/fail logic accordingly: - For multipath devices: pass if no I/O error (expected behavior) - For non-multipath devices: pass if I/O error occurs (original behavior) Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Replace the short option -f with the long option --canonicalize for better readability and consistency. Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
c885a88 to
836b33c
Compare
Yes, the failure injection happens at this level, that means you have to set it for all nvme0c*n1 devices. Another thing you need to do then is to set the default timeout for the I/Os. The default value is 30s. So the block layer will hold all I/Os in the queue and they won't be reported to the upper layers before it times out. Something like this: |
|
I tried the change like [1], but the running fio never finished, and there are continuous error outputs from dmesg[2], and from [3], seems the fio is in [2] [3] |
|
Thanks for testing. Either my mental model how this is supposed is working is broken or you just fund a bug. Anyway, I am still catching up with all the email/tasks after my vacation, so it takes a bit longer. |
$./check nvme/050
nvme/050 => nvme1n1 (test nvme-pci timeout with fio jobs) [failed]
runtime 94.236s ... 62.734s
--- tests/nvme/050.out 2025-11-17 00:23:56.086469327 -0500
+++ /root/blktests/results/nvme1n1/nvme/050.out.bad 2025-11-19 03:17:45.389644408 -0500
@@ -1,2 +1,3 @@
Running nvme/050
-Test complete
+Test failed
+tests/nvme/050: line 50: /sys/bus/pci/devices//remove: Permission denied