[ Upstream commit 
f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b ]
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.
In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
  all the way until it flushes the gc work, and returns without freeing
  the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
  work and frees the context's device resources.
Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization.  This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
        unsigned long flags;
 
        spin_lock_irqsave(&tls_device_lock, flags);
+       if (unlikely(!refcount_dec_and_test(&ctx->refcount)))
+               goto unlock;
+
        list_move_tail(&ctx->list, &tls_device_gc_list);
 
        /* schedule_work inside the spinlock
         * to make sure tls_device_down waits for that work.
         */
        schedule_work(&tls_device_gc_work);
-
+unlock:
        spin_unlock_irqrestore(&tls_device_lock, flags);
 }
 
                clean_acked_data_disable(inet_csk(sk));
        }
 
-       if (refcount_dec_and_test(&tls_ctx->refcount))
-               tls_device_queue_ctx_destruction(tls_ctx);
+       tls_device_queue_ctx_destruction(tls_ctx);
 }
 EXPORT_SYMBOL_GPL(tls_device_sk_destruct);