drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
authorshaoyunl <shaoyun.liu@amd.com>
Sun, 14 Nov 2021 17:38:18 +0000 (12:38 -0500)
committerAlex Deucher <alexander.deucher@amd.com>
Mon, 22 Nov 2021 19:45:02 +0000 (14:45 -0500)
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 42b2cc99943454a5ab27d4d7273946e32d279e4a..62fe28244a8059a540981e74f44f02a93ec4e9b6 100644 (file)
@@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
        bool hanging;
 
        dqm_lock(dqm);
+       if (!dqm->sched_running) {
+               dqm_unlock(dqm);
+               return 0;
+       }
+
        if (!dqm->is_hws_hang)
                unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
        hanging = dqm->is_hws_hang || dqm->is_resetting;