accel/habanalabs: reset device if scrubbing failed
authorOded Gabbay <ogabbay@kernel.org>
Mon, 12 Jun 2023 11:24:05 +0000 (14:24 +0300)
committerOded Gabbay <ogabbay@kernel.org>
Mon, 9 Oct 2023 09:37:19 +0000 (12:37 +0300)
If scrubbing memory after user released device has failed it means
the device is in a bad state and should be reset.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
drivers/accel/habanalabs/common/device.c

index 5e61761b8c11f8f20237be75f2654f54e497c2c3..d7d9198b210322b47dd0ba5d63cd6489cebf9ed8 100644 (file)
@@ -454,8 +454,10 @@ static void hpriv_release(struct kref *ref)
                /* Scrubbing is handled within hl_device_reset(), so here need to do it directly */
                int rc = hdev->asic_funcs->scrub_device_mem(hdev);
 
-               if (rc)
+               if (rc) {
                        dev_err(hdev->dev, "failed to scrub memory from hpriv release (%d)\n", rc);
+                       hl_device_reset(hdev, HL_DRV_RESET_HARD);
+               }
        }
 
        /* Now we can mark the compute_ctx as not active. Even if a reset is running in a different