drm/xe/selftests: restart GT after xe_bo_restore_kernel()
authorMatthew Auld <matthew.auld@intel.com>
Thu, 13 Jul 2023 09:13:33 +0000 (10:13 +0100)
committerRodrigo Vivi <rodrigo.vivi@intel.com>
Thu, 21 Dec 2023 16:37:37 +0000 (11:37 -0500)
Test seems to be failing badly after calling xe_bo_restore_kernel().
Taking a snapshot of the CTB and copying back a potentially old version
seems risky, depending on what might have been inflight. Also it seems
snapshotting the ADS object and copying back results in serious
breakage. Normally when calling xe_bo_restore_kernel() we always fully
restart the GT, which re-intializes such things.  We could potentially
skip saving and restoring such objects in xe_bo_evict_all() however
seems quite fragile not to also restart the GT. Try to do that here by
triggering a GT reset.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drivers/gpu/drm/xe/tests/xe_bo.c

index 16e92400e51041172df22901130ed9ba5a862846..5d60dc6bfe7110ccfc4172335eaa12df495b385c 100644 (file)
@@ -218,7 +218,21 @@ static int evict_test_run_gt(struct xe_device *xe, struct xe_gt *gt, struct kuni
                        goto cleanup_all;
                }
 
+               xe_gt_sanitize(gt);
                err = xe_bo_restore_kernel(xe);
+               /*
+                * Snapshotting the CTB and copying back a potentially old
+                * version seems risky, depending on what might have been
+                * inflight. Also it seems snapshotting the ADS object and
+                * copying back results in serious breakage. Normally when
+                * calling xe_bo_restore_kernel() we always fully restart the
+                * GT, which re-intializes such things.  We could potentially
+                * skip saving and restoring such objects in xe_bo_evict_all()
+                * however seems quite fragile not to also restart the GT. Try
+                * to do that here by triggering a GT reset.
+                */
+               xe_gt_reset_async(gt);
+               flush_work(&gt->reset.worker);
                if (err) {
                        KUNIT_FAIL(test, "restore kernel err=%pe\n",
                                   ERR_PTR(err));