drm/i915: Fix gt reset with GuC submission is disabled
authorNirmoy Das <nirmoy.das@intel.com>
Mon, 22 Apr 2024 20:19:51 +0000 (22:19 +0200)
committerAndi Shyti <andi.shyti@linux.intel.com>
Wed, 24 Apr 2024 16:48:32 +0000 (18:48 +0200)
commit4d3421e04c5dc38baf15224c051256204f223c15
tree55e9cca99f1d7a86e9dba479d47f74059761bf9b
parent31c3c53ee3a3e39aac690dffab75765d25e318dd
drm/i915: Fix gt reset with GuC submission is disabled

Currently intel_gt_reset() kills the GuC and then resets requested
engines. This is problematic because there is a dedicated CSB FIFO
which only GuC can access and if that FIFO fills up, the hardware
will block on the next context switch until there is space that means
the system is effectively hung. If an engine is reset whilst actively
executing a context, a CSB entry will be sent to say that the context
has gone idle. Thus if reset happens on a very busy system then
killing GuC before killing the engines will lead to deadlock because
of filled up CSB FIFO.

To address this issue, the GuC should be killed only after resetting
the requested engines and before calling intel_gt_init_hw().

v2: Improve commit message(John)

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240422201951.633-2-nirmoy.das@intel.com
drivers/gpu/drm/i915/gt/intel_reset.c