nvme/pci: Log PCI_STATUS when the controller dies

When debugging nvme controller crashes, it's nice to know whether the controller died cleanly so that the failure is just reflected in CSTS, it died and put an error in PCI_STATUS, or whether it died so badly that it stopped responding to PCI configuration space reads. I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung "SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q". Reviewed-by: N Christoph Hellwig <hch@lst.de> Signed-off-by: N Andy Lutomirski <luto@kernel.org> Reviewed-by: N Keith Busch <keith.busch@intel.com> Fixed up white space and hunk reject. Signed-off-by: N Jens Axboe <axboe@fb.com>

nvme/pci: Log PCI_STATUS when the controller dies
When debugging nvme controller crashes, it's nice to know whether the controller died cleanly so that the failure is just reflected in CSTS, it died and put an error in PCI_STATUS, or whether it died so badly that it stopped responding to PCI configuration space reads. I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung "SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q". Reviewed-by: N Christoph Hellwig <hch@lst.de> Signed-off-by: N Andy Lutomirski <luto@kernel.org> Reviewed-by: N Keith Busch <keith.busch@intel.com> Fixed up white space and hunk reject. Signed-off-by: N Jens Axboe <axboe@fb.com>
d2a61918 · Andy Lutomirski · Jens Axboe · bcc7f5b4 · d2a61918
隐藏空白更改
内联并排

Showing with 19 addition and 3 deletion

drivers/nvme/host/pci.c drivers/nvme/host/pci.c +19 -3

未找到文件。
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1282,6 +1282,24 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 	return true;
 }
+static void nvme_warn_reset(struct nvme_dev *dev, u32 csts)
+{
+	/* Read a config register to help see what died. */
+	u16 pci_status;
+	int result;
+	result = pci_read_config_word(to_pci_dev(dev->dev), PCI_STATUS,
+				      &pci_status);
+	if (result == PCIBIOS_SUCCESSFUL)
+		dev_warn(dev->dev,
+			 "controller is down; will reset: CSTS=0x%x, PCI_STATUS=0x%hx\n",
+			 csts, pci_status);
+	else
+		dev_warn(dev->dev,
+			 "controller is down; will reset: CSTS=0x%x, PCI_STATUS read failed (%d)\n",
+			 csts, result);
+}
 static void nvme_watchdog_timer(unsigned long data)
 {
 	struct nvme_dev *dev = (struct nvme_dev *)data;
@@ -1290,9 +1308,7 @@ static void nvme_watchdog_timer(unsigned long data)
 	/* Skip controllers under certain specific conditions. */
 	if (nvme_should_reset(dev, csts)) {
 		if (!nvme_reset(dev))
-			dev_warn(dev->dev,
+			nvme_warn_reset(dev, csts);
-				"Failed status: 0x%x, reset controller.\n",
-				csts);
 		return;
 	}