drm/amdgpu: fix a GPU hang issue when remove device
authorDennis Li <Dennis.Li@amd.com>
Wed, 30 Dec 2020 11:45:15 +0000 (19:45 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Tue, 5 Jan 2021 16:31:55 +0000 (11:31 -0500)
When GFXOFF is enabled and GPU is idle, driver will fail to access some
registers. Therefore change to disable power gating before all access
registers with MMIO.

Dmesg log is as following:
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1cb7d73f7317bd26d0d7d1e4d552333f5e9169fa..b69c34074d8d7a02f9fb23ee7fef394561f37978 100644 (file)
@@ -2548,11 +2548,11 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
        if (adev->gmc.xgmi.num_physical_nodes > 1)
                amdgpu_xgmi_remove_device(adev);
 
-       amdgpu_amdkfd_device_fini(adev);
-
        amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
        amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
 
+       amdgpu_amdkfd_device_fini(adev);
+
        /* need to disable SMC first */
        for (i = 0; i < adev->num_ip_blocks; i++) {
                if (!adev->ip_blocks[i].status.hw)