accel/habanalabs/gaudi2: get the correct QM CQ info upon an error
authorTomer Tayar <ttayar@habana.ai>
Mon, 6 Nov 2023 16:41:35 +0000 (18:41 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Tue, 19 Dec 2023 09:09:43 +0000 (11:09 +0200)
commitae303d885d4a0fcea65330de9327d28edfebd206
tree564cfc67a3e9b256e2dcd7c405b39d15212b617c
parent4b0b1fbc7757169b6d304545a321c7a88f13f8f0
accel/habanalabs/gaudi2: get the correct QM CQ info upon an error

Upon a QM error, the address/size from both the CQ and the ARC_CQ are
printed, although the instruction that led to the error was received
from only one of them.

Moreover, in case of a QM undefined opcode, only one of these
address/size sets will be captured based on the value of ARC_CQ_PTR.
However, this value can be non-zero even if currently the CQ is used, in
case the CQ/ARC_CQ are alternately used.

Under the assumption of having a stop-on-error configuration, modify to
use CP_STS.CUR_CQ field to get the relevant CQ for the QM error.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/accel/habanalabs/gaudi2/gaudi2.c
drivers/accel/habanalabs/include/gaudi2/asic_reg/gaudi2_regs.h