From: Boris Brezillon Date: Wed, 30 Jun 2021 06:27:36 +0000 (+0200) Subject: drm/sched: Document what the timedout_job method should do X-Git-Url: http://git.maquefel.me/?a=commitdiff_plain;h=1fad1b7ed1ebfcfb5a1d0d21b0c47f7af5f49a6c;p=linux.git drm/sched: Document what the timedout_job method should do The documentation is a bit vague and doesn't really describe what the ->timedout_job() is expected to do. Let's add a few more details. v5: * New patch Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20210630062751.2832545-2-boris.brezillon@collabora.com --- diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd0090..aa90ed1f1b2be 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -239,6 +239,20 @@ struct drm_sched_backend_ops { * @timedout_job: Called when a job has taken too long to execute, * to trigger GPU recovery. * + * This method is called in a workqueue context. + * + * Drivers typically issue a reset to recover from GPU hangs, and this + * procedure usually follows the following workflow: + * + * 1. Stop the scheduler using drm_sched_stop(). This will park the + * scheduler thread and cancel the timeout work, guaranteeing that + * nothing is queued while we reset the hardware queue + * 2. Try to gracefully stop non-faulty jobs (optional) + * 3. Issue a GPU reset (driver-specific) + * 4. Re-submit jobs using drm_sched_resubmit_jobs() + * 5. Restart the scheduler using drm_sched_start(). At that point, new + * jobs can be queued, and the scheduler thread is unblocked + * * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, * and the underlying driver has started or completed recovery. *