[ Upstream commit
9078e843efec530f279a155f262793c58b0746bd ]
Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.
Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.
Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
priv = container_of(health, struct mlx5_priv, health);
dev = container_of(priv, struct mlx5_core_dev, priv);
+ mutex_lock(&dev->intf_state_mutex);
+ if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
+ mlx5_core_err(dev, "health works are not permitted at this stage\n");
+ return;
+ }
+ mutex_unlock(&dev->intf_state_mutex);
enter_error_state(dev, false);
if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) {
if (mlx5_health_try_recover(dev))