habanalabs: add an option to control watchdog timeout via debugfs
authorTomer Tayar <ttayar@habana.ai>
Fri, 30 Sep 2022 13:37:41 +0000 (16:37 +0300)
committerOded Gabbay <ogabbay@kernel.org>
Wed, 23 Nov 2022 14:13:43 +0000 (16:13 +0200)
Add an option to control the timeout value for the driver's watchdog
of the reset process. The timeout represents the amount of the user
has to close his process once he gets a device reset notification from
the driver.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Documentation/ABI/testing/debugfs-driver-habanalabs
drivers/misc/habanalabs/common/debugfs.c

index c915bf17b2932632aa5be1b8fd475a6efb03230d..85f6d04f528b6fcbf57326117477e5dab9e686e1 100644 (file)
@@ -91,6 +91,13 @@ Description:    Enables the root user to set the device to specific state.
                 Valid values are "disable", "enable", "suspend", "resume".
                 User can read this property to see the valid values
 
+What:           /sys/kernel/debug/habanalabs/hl<n>/device_release_watchdog_timeout
+Date:           Oct 2022
+KernelVersion:  6.2
+Contact:        ttayar@habana.ai
+Description:    The watchdog timeout value in seconds for a device relese upon
+                certain error cases, after which the device is reset.
+
 What:           /sys/kernel/debug/habanalabs/hl<n>/dma_size
 Date:           Apr 2021
 KernelVersion:  5.13
index 48d3ec8b5c8257bbb8ddd7085c5354f4a5e65cd0..945c0e6758caa5247bd74f60d864c4288b636e64 100644 (file)
@@ -1769,6 +1769,11 @@ void hl_debugfs_add_device(struct hl_device *hdev)
                                dev_entry,
                                &hl_timeout_locked_fops);
 
+       debugfs_create_u32("device_release_watchdog_timeout",
+                               0644,
+                               dev_entry->root,
+                               &hdev->device_release_watchdog_timeout_sec);
+
        for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) {
                debugfs_create_file(hl_debugfs_list[i].name,
                                        0444,