accel/habanalabs: add log when eq event is not received
authorFarah Kassabri <fkassabri@habana.ai>
Sun, 29 Oct 2023 14:16:16 +0000 (16:16 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Tue, 19 Dec 2023 09:09:42 +0000 (11:09 +0200)
Add error log when no eq event is received from FW,
to cover a scenario when FW is stuck for some reason.
In such case driver will not receive neither the eq error interrupt
or the eq heartbeat event, and will just initiate a reset without
indication in the dmesg about the reason.

Signed-off-by: Farah Kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/accel/habanalabs/common/device.c

index 9711e8fc979d9ade05bdf96bcba300c4ec27432c..d95a981b2906007e772895db19e7aa5fcac666f0 100644 (file)
@@ -1049,10 +1049,12 @@ static void hl_device_eq_heartbeat(struct hl_device *hdev)
        if (!prop->cpucp_info.eq_health_check_supported)
                return;
 
-       if (hdev->eq_heartbeat_received)
+       if (hdev->eq_heartbeat_received) {
                hdev->eq_heartbeat_received = false;
-       else
+       } else {
+               dev_err(hdev->dev, "EQ heartbeat event was not received!\n");
                hl_device_cond_reset(hdev, HL_DRV_RESET_HARD, event_mask);
+       }
 }
 
 static void hl_device_heartbeat(struct work_struct *work)