drm/msm: Hangcheck progress detection
authorRob Clark <robdclark@chromium.org>
Mon, 14 Nov 2022 19:30:41 +0000 (11:30 -0800)
committerRob Clark <robdclark@chromium.org>
Thu, 17 Nov 2022 18:39:12 +0000 (10:39 -0800)
commitd73b1d02de0858b96f743e1e8b767fb092ae4c1b
treea5fedb0e54c0f5e721fa11619796c818a9963e01
parentcade05b2a88558847984287dd389fae0c7de31d6
drm/msm: Hangcheck progress detection

If the hangcheck timer expires, check if the fw's position in the
cmdstream has advanced (changed) since last timer expiration, and
allow it up to three additional "extensions" to it's alotted time.
The intention is to continue to catch "shader stuck in a loop" type
hangs quickly, but allow more time for things that are actually
making forward progress.

Because we need to sample the CP state twice to detect if there has
not been progress, this also cuts the the timer's duration in half.

v2: Fix typo (REG_A6XX_CP_CSQ_IB2_STAT), add comment
v3: Only halve hangcheck timer duration for generations which
    support progress detection (hdanton); removed unused a5xx
    progress (without knowing how to adjust for data buffered
    in ROQ it is too likely to report a false negative)
v4: Comment updates to better describe the total hangcheck
    duration when progress detection is applied

Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Tested-by: Chia-I Wu <olvaffe@gmail.com> # dEQP-GLES2.functional.flush_finish.wait
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/511584/
Link: https://lore.kernel.org/r/20221114193049.1533391-3-robdclark@gmail.com
drivers/gpu/drm/msm/adreno/a6xx_gpu.c
drivers/gpu/drm/msm/msm_drv.c
drivers/gpu/drm/msm/msm_drv.h
drivers/gpu/drm/msm/msm_gpu.c
drivers/gpu/drm/msm/msm_gpu.h
drivers/gpu/drm/msm/msm_ringbuffer.h