scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
authorJames Smart <jsmart2021@gmail.com>
Wed, 20 Oct 2021 21:14:16 +0000 (14:14 -0700)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 25 Nov 2021 08:48:29 +0000 (09:48 +0100)
commitbf76f56a7fc751e6265c1b6ef85d138291f775ac
tree976dc2fe3ff9bd6d4d3fc3e4874d3dfa57cb64f5
parent28de48a7cea495ab48082d9ff4ef63f7cb4e563a
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss

[ Upstream commit af984c87293b19dccbd0b16afc57c5c9a4a279c7 ]

A link bounce to a slow fabric may observe FDISC response delays lasting
longer than devloss tmo.  Current logic decrements the final fabric node
kref during a devloss tmo event.  This results in a NULL ptr dereference
crash if the FDISC completes for that fabric node after devloss tmo.

Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when
devloss tmo triggers and we've noticed that fabric node recovery has
already started or finished in between the time lpfc_dev_loss_tmo_callbk
queues lpfc_dev_loss_tmo_handler.  If fabric node recovery succeeds, then
the driver reverses the devloss tmo marked kref put with a kref get.  If
fabric node recovery fails, then the final kref put relies on the ELS
timing out or the REG_LOGIN cmpl routine.

Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/scsi/lpfc/lpfc_crtn.h
drivers/scsi/lpfc/lpfc_disc.h
drivers/scsi/lpfc/lpfc_els.c
drivers/scsi/lpfc/lpfc_hbadisc.c
drivers/scsi/lpfc/lpfc_init.c
drivers/scsi/lpfc/lpfc_scsi.c