drm/i915/guc: Don't hog IRQs when destroying contexts
authorJohn Harrison <John.C.Harrison@Intel.com>
Tue, 14 Dec 2021 17:04:57 +0000 (09:04 -0800)
committerJohn Harrison <John.C.Harrison@Intel.com>
Thu, 16 Dec 2021 03:10:46 +0000 (19:10 -0800)
commit2406846ec497af081d7e7a7da0e9938b8136fe16
tree3af20688d4186ef9c2d52fa2919833b4cd11727b
parent7aa6d5fe6cdb4347c427caaba38f11cc88a8ed4d
drm/i915/guc: Don't hog IRQs when destroying contexts

While attempting to debug a CT deadlock issue in various CI failures
(most easily reproduced with gem_ctx_create/basic-files), I was seeing
CPU deadlock errors being reported. This were because the context
destroy loop was blocking waiting on H2G space from inside an IRQ
spinlock. There no was deadlock as such, it's just that the H2G queue
was full of context destroy commands and GuC was taking a long time to
process them. However, the kernel was seeing the large amount of time
spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would
then happen (heartbeat failures, CT deadlock errors, outstanding H2G
WARNs, etc.).

Re-working the loop to only acquire the spinlock around the list
management (which is all it is meant to protect) rather than the
entire destroy operation seems to fix all the above issues.

v2:
 (John Harrison)
  - Fix typo in comment message

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-5-matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c