PCI/AER: Block runtime suspend when handling errors
authorStanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Mon, 12 Feb 2024 12:01:35 +0000 (13:01 +0100)
committerBjorn Helgaas <bhelgaas@google.com>
Thu, 7 Mar 2024 23:58:07 +0000 (17:58 -0600)
commit002bf2fbc00e5c4b95fb167287e2ae7d1973281e
tree6431fa830ceac26fcdf8091264ebf9de5c1086a2
parent96ed79791b1b213c892301595459e0ea404540b3
PCI/AER: Block runtime suspend when handling errors

PM runtime can be done simultaneously with AER error handling.  Avoid that
by using pm_runtime_get_sync() before and pm_runtime_put() after reset in
pcie_do_recovery() for all recovering devices.

pm_runtime_get_sync() will increase dev->power.usage_count counter to
prevent any possible future request to runtime suspend a device.  It will
also resume a device, if it was previously in D3hot state.

I tested with igc device by doing simultaneous aer_inject and rpm
suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can
reproduce:

  igc 0000:02:00.0: not ready 65535ms after bus reset; giving up
  pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25)
  pcieport 0000:00:1c.2: AER: subordinate device reset failed
  pcieport 0000:00:1c.2: AER: device recovery failed
  igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible

The problem disappears when this patch is applied.

Link: https://lore.kernel.org/r/20240212120135.146068-1-stanislaw.gruszka@linux.intel.com
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: <stable@vger.kernel.org>
drivers/pci/pcie/err.c