From: Ben Skeggs <bskeggs@redhat.com>
Date: Wed, 1 Jun 2022 10:47:33 +0000 (+1000)
Subject: drm/nouveau/fifo: kill channel on a selection of PBDMA errors
X-Git-Url: http://git.maquefel.me/?a=commitdiff_plain;h=520db0405e9daed6b96b69149673491d80849fe7;p=linux.git

drm/nouveau/fifo: kill channel on a selection of PBDMA errors

A bunch of these can be handled in such a way that the channel can
continue, however, any of these are a pretty decent sign something
has gone horribly wrong, and the safest option is to disable the
channel.

This is a bit of a hack, we will want to handle these individually
and dump relevant debug info for each at some point.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
index 4c3338c4d47a8..ff28b5a4c36f9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
@@ -30,7 +30,6 @@
 #include "gf100.h"
 #include "changf100.h"
 
-#include <core/client.h>
 #include <core/gpuobj.h>
 #include <subdev/bar.h>
 #include <subdev/fault.h>
@@ -138,8 +137,9 @@ gf100_runq_intr(struct nvkm_runq *runq, struct nvkm_runl *null)
 		nvkm_error(subdev, "PBDMA%d: %08x [%s] ch %d [%010llx %s] "
 				   "subc %d mthd %04x data %08x\n",
 			   runq->id, show, msg, chid, chan ? chan->inst->addr : 0,
-			   chan ? chan->object.client->name : "unknown",
-			   subc, mthd, data);
+			   chan ? chan->name : "unknown", subc, mthd, data);
+		if ((stat & 0xc67fe000) && chan)
+			nvkm_chan_error(chan, true);
 		nvkm_chan_put(&chan, flags);
 	}