Brendan Cunningham [Tue, 22 Aug 2023 14:07:53 +0000 (10:07 -0400)]
RDMA/hfi1: Move user SDMA system memory pinning code to its own file
Move user SDMA system memory page-pinning code from user_sdma.c to
pin_system.c. Put declarations for non-static functions in pinning.h.
System memory pinning is necessary for processing user SDMA requests but
actual steps are invisible to user SDMA request-processing code. Moving
system memory pinning code for user SDMA to its own file makes this
distinction apparent.
These changes have no effect on userspace.
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/169271327311.1855761.4736714053318724062.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Jinjie Ruan [Tue, 22 Aug 2023 03:35:38 +0000 (11:35 +0800)]
RDMA/hfi1: Use list_for_each_entry() helper
Convert list_for_each() to list_for_each_entry() so that the pos list_head
pointer and list_entry() call are no longer needed, which can
reduce a few lines of code. No functional changed.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230822033539.3692453-1-ruanjinjie@huawei.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Rohit Chavan [Tue, 22 Aug 2023 12:04:51 +0000 (17:34 +0530)]
RDMA/mlx5: Fix trailing */ formatting in block comment
Resolved a formatting issue where the trailing */ in a block comment
was placed on a same line instead of separate line.
Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230822120451.8215-1-roheetchavan@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Rohit Chavan [Tue, 22 Aug 2023 09:13:04 +0000 (14:43 +0530)]
RDMA/rxe: Fix redundant break statement in switch-case.
Removed unreachable break statement after return.
Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230822091304.7312-1-roheetchavan@gmail.com
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Yonatan Nachum [Tue, 22 Aug 2023 08:27:25 +0000 (08:27 +0000)]
RDMA/efa: Fix wrong resources deallocation order
When trying to destroy QP or CQ, we first decrease the refcount and
potentially free memory regions allocated for the object and then
request the device to destroy the object. If the device fails, the
object isn't fully destroyed so the user/IB core can try to destroy the
object again which will lead to underflow when trying to decrease an
already zeroed refcount.
Deallocate resources in reverse order of allocating them to safely free
them.
Fixes: ff6629f88c52 ("RDMA/efa: Do not delay freeing of DMA pages")
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Link: https://lore.kernel.org/r/20230822082725.31719-1-ynachum@amazon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Guoqing Jiang [Mon, 21 Aug 2023 13:32:55 +0000 (21:32 +0800)]
RDMA/siw: Call llist_reverse_order in siw_run_sq
We can call the function to get fifo list.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-4-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Guoqing Jiang [Mon, 21 Aug 2023 13:32:54 +0000 (21:32 +0800)]
RDMA/siw: Correct wrong debug message
We need to print num_sle first then pbl->max_buf per the condition.
Also replace mem->pbl with pbl while at it.
Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-3-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Guoqing Jiang [Mon, 21 Aug 2023 13:32:53 +0000 (21:32 +0800)]
RDMA/siw: Balance the reference of cep->kref in the error path
The siw_connect can go to err in below after cep is allocated successfully:
1. If siw_cm_alloc_work returns failure. In this case socket is not
assoicated with cep so siw_cep_put can't be called by siw_socket_disassoc.
We need to call siw_cep_put twice since cep->kref is increased once after
it was initialized.
2. If siw_cm_queue_work can't find a work, which means siw_cep_get is not
called in siw_cm_queue_work, so cep->kref is increased twice by siw_cep_get
and when associate socket with cep after it was initialized. So we need to
call siw_cep_put three times (one in siw_socket_disassoc).
3. siw_send_mpareqrep returns error, this scenario is similar as 2.
So we need to remove one siw_cep_put in the error path.
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-2-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Leon Romanovsky [Mon, 21 Aug 2023 07:57:14 +0000 (10:57 +0300)]
Revert "IB/isert: Fix incorrect release of isert connection"
Commit:
699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is
causing problems on OPA when DEVICE_REMOVAL is happening.
------------[ cut here ]------------
WARNING: CPU: 52 PID:
2117247 at drivers/infiniband/core/cq.c:359
ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc
scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs
rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod
opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm
ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core
x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt
ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma
intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter
acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul
crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci
ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse
CPU: 52 PID:
2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1
Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS
SE5C610.86B.01.01.0014.
121820151719 12/18/2015
RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83
c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f
RSP: 0018:
ffffc10bea13fc80 EFLAGS:
00010206
RAX:
000000000000010c RBX:
ffff9bf5c7e66c00 RCX:
000000008020001d
RDX:
000000008020001e RSI:
fffff175221f9900 RDI:
ffff9bf5c7e67640
RBP:
ffff9bf5c7e67600 R08:
ffff9bf5c7e64400 R09:
000000008020001d
R10:
0000000040000000 R11:
0000000000000000 R12:
ffff9bee4b1e8a18
R13:
dead000000000122 R14:
dead000000000100 R15:
ffff9bee4b1e8a38
FS:
00007ff1e6d38740(0000) GS:
ffff9bfd9fb00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00005652044ecc68 CR3:
0000000889b5c005 CR4:
00000000001706e0
Call Trace:
<TASK>
? __warn+0x80/0x130
? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
? report_bug+0x195/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
disable_device+0x9d/0x160 [ib_core]
__ib_unregister_device+0x42/0xb0 [ib_core]
ib_unregister_device+0x22/0x30 [ib_core]
rvt_unregister_device+0x20/0x90 [rdmavt]
hfi1_unregister_ib_device+0x16/0xf0 [hfi1]
remove_one+0x55/0x1a0 [hfi1]
pci_device_remove+0x36/0xa0
device_release_driver_internal+0x193/0x200
driver_detach+0x44/0x90
bus_remove_driver+0x69/0xf0
pci_unregister_driver+0x2a/0xb0
hfi1_mod_cleanup+0xc/0x3c [hfi1]
__do_sys_delete_module.constprop.0+0x17a/0x2f0
? exit_to_user_mode_prepare+0xc4/0xd0
? syscall_trace_enter.constprop.0+0x126/0x1a0
do_syscall_64+0x5c/0x90
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x69/0x90
? syscall_exit_work+0x103/0x130
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x69/0x90
? exc_page_fault+0x65/0x150
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7ff1e643f5ab
Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
RSP: 002b:
00007ffec9103cc8 EFLAGS:
00000206 ORIG_RAX:
00000000000000b0
RAX:
ffffffffffffffda RBX:
00005615267fdc50 RCX:
00007ff1e643f5ab
RDX:
0000000000000000 RSI:
0000000000000800 RDI:
00005615267fdcb8
RBP:
00005615267fdc50 R08:
0000000000000000 R09:
0000000000000000
R10:
00007ff1e659eac0 R11:
0000000000000206 R12:
00005615267fdcb8
R13:
0000000000000000 R14:
00005615267fdcb8 R15:
00007ffec9105ff8
</TASK>
---[ end trace
0000000000000000 ]---
And...
restrack: ------------[ cut here ]------------
infiniband hfi1_0: BUG: RESTRACK detected leak of resources
restrack: Kernel PD object allocated by ib_isert is not freed
restrack: Kernel CQ object allocated by ib_core is not freed
restrack: Kernel QP object allocated by rdma_cm is not freed
restrack: ------------[ cut here ]------------
Fixes: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection")
Reported-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@cornelisnetworks.com
Link: https://lore.kernel.org/r/a27982d3235005c58f6d321f3fad5eb6e1beaf9e.1692604607.git.leonro@nvidia.com
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Leon Romanovsky [Sun, 20 Aug 2023 13:53:15 +0000 (16:53 +0300)]
RDMA/bnxt_re: Fix kernel doc errors
Fix set of the following errors due to use of wrong kernel doc format
to describe function parameters:
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:68: warning: Function parameter or member 'rcfw'
not described in '__wait_for_resp'
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308180600.oOnkIAQV-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308180401.iaj2ktTc-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308180214.Lt9NAhbM-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308180055.6zM4AK6V-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308172136.ipx1wvs6-lkp@intel.com/
Link: https://lore.kernel.org/r/4b22c385f1b68590ace8f82f2985d14b20999432.1692539554.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Christopher Bednarz [Fri, 18 Aug 2023 14:48:38 +0000 (09:48 -0500)]
RDMA/irdma: Prevent zero-length STAG registration
Currently irdma allows zero-length STAGs to be programmed in HW during
the kernel mode fast register flow. Zero-length MR or STAG registration
disable HW memory length checks.
Improve gaps in bounds checking in irdma by preventing zero-length STAG or
MR registrations except if the IB_PD_UNSAFE_GLOBAL_RKEY is set.
This addresses the disclosure CVE-2023-25775.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230818144838.1758-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Cheng Xu [Thu, 17 Aug 2023 10:21:51 +0000 (18:21 +0800)]
RDMA/erdma: Implement hierarchical MTT
Hierarchical MTT allows large MR registration without the need of
continuous physical address. This commit adds the support of hierarchical
MTT support for erdma.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Cheng Xu [Thu, 17 Aug 2023 10:21:50 +0000 (18:21 +0800)]
RDMA/erdma: Refactor the storage structure of MTT entries
Currently our MTT only support inline mtt entries (0 level MTT) and
indirect MTT entries (1 level mtt), which will limit the maximum length
of MRs. In order to implement a multi-level MTT, we refactor the
structure of MTT first.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Cheng Xu [Thu, 17 Aug 2023 10:21:49 +0000 (18:21 +0800)]
RDMA/erdma: Renaming variable names and field names of struct erdma_mem
Currently, variable names and field names of struct erdma_mem contain
'mtt', which is not accurate. Renaming them with 'xxx_mem' or 'mem'.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chengchang Tang [Wed, 16 Aug 2023 09:18:11 +0000 (17:18 +0800)]
RDMA/hns: Support hns HW stats
Support query hns HW stats for rdma-tool to help debugging.
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230816091812.2899366-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chengchang Tang [Wed, 16 Aug 2023 09:18:10 +0000 (17:18 +0800)]
RDMA/hns: Dump whole QP/CQ/MR resource in raw
Currently, some fields in the QP/CQ/MR resource can be dumped by
rdma-tool, but this information is not enough. It is very
inconvenient to continue to expand on the current field, and it
will also introduce some trouble to parse these raw data.
This patch dump whole resource in raw to avoid the above problems.
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230816091812.2899366-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Leon Romanovsky [Thu, 17 Aug 2023 08:13:53 +0000 (11:13 +0300)]
RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp()
Fix the following warning reported by kbuild:
drivers/infiniband/hw/irdma/verbs.c:584: warning: Function parameter
or member 'udata' not described in 'irdma_setup_umode_qp'
Fixes: 3a8498720450 ("RDMA/irdma: Allow accurate reporting on QP max send/recv WR")
Link: https://lore.kernel.org/r/2c9bcd2b773c400a1699bd7973e22bfba1e4b379.1692260011.git.leonro@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308171620.m4MNACWz-lkp@intel.com/
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Gustavo A. R. Silva [Tue, 15 Aug 2023 20:39:53 +0000 (14:39 -0600)]
RDMA/mlx4: Copy union directly
Copy union directly instead of using memcpy().
Note that in this case, a direct assignment is more readable and
consistent with the subsequent assignments.
This addresses the following -Wstringop-overflow warning seen in s390
with defconfig:
drivers/infiniband/hw/mlx4/main.c:296:33: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
296 | memcpy(&port_gid_table->gids[free].gid,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
297 | &attr->gid, sizeof(attr->gid));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This helps with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/308
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZNvimeRAPkJ24zRG@work
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Shiraz Saleem [Wed, 16 Aug 2023 00:12:09 +0000 (19:12 -0500)]
RDMA/irdma: Drop unused kernel push code
The driver has code blocks for kernel push WQEs but does not
map the doorbell page rendering this mode non functional [1]
Remove code associated with this feature from the kernel fast
path as there is currently no plan of record to support this.
This also address a sparse issue reported by lkp.
drivers/infiniband/hw/irdma/uk.c:285:24: sparse: sparse: incorrect type in assignment (different base types) @@ expected bool [usertype] push_wqe:1 @@ got restricted __le32 [usertype] *push_db @@
drivers/infiniband/hw/irdma/uk.c:285:24: sparse: expected bool [usertype] push_wqe:1
drivers/infiniband/hw/irdma/uk.c:285:24: sparse: got restricted __le32 [usertype] *push_db
drivers/infiniband/hw/irdma/uk.c:386:24: sparse: sparse: incorrect type in assignment (different base types) @@ expected bool [usertype] push_wqe:1 @@ got restricted __le32 [usertype] *push_db @@
[1] https://lore.kernel.org/linux-rdma/
20230815051809.GB22185@unreal/T/#t
Fixes: 272bba19d631 ("RDMA: Remove unnecessary ternary operators")
Fixes: 551c46edc769 ("RDMA/irdma: Add user/kernel shared libraries")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308110251.BV6BcwUR-lkp@intel.com/
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230816001209.1721-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Saravanan Vajravel [Mon, 31 Jul 2023 08:01:13 +0000 (01:01 -0700)]
RDMA/bnxt_re: Add support for dmabuf pinned memory regions
Support the new verb which indicates dmabuf support. bnxt doesn't support
ODP. So use the pinned version of the dmabuf APIs to enable bnxt_re
devices to work as dmabuf importer.
Link: https://lore.kernel.org/r/1690790473-25850-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Selvin Xavier [Mon, 14 Aug 2023 17:00:19 +0000 (10:00 -0700)]
RDMA/bnxt_re: Protect the PD table bitmap
Syncrhonization is required to avoid simultaneous allocation
of the PD. Add a new mutex lock to handle allocation from
the PD table.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1692032419-21680-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kashyap Desai [Mon, 14 Aug 2023 17:00:18 +0000 (10:00 -0700)]
RDMA/bnxt_re: Initialize mutex dbq_lock
Fix the missing dbq_lock mutex initialization
Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1692032419-21680-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kalesh AP [Wed, 2 Aug 2023 09:00:23 +0000 (02:00 -0700)]
IB/core: Add more speed parsing in ib_get_width_and_speed()
When the Ethernet driver does not provide the number of lanes
in the __ethtool_get_link_ksettings() response, the function
ib_get_width_and_speed() does not take consideration of 50G,
100G and 200G speeds while calculating the IB width and speed.
Update the width and speed for the above netdev speeds.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1690966823-8159-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Yue Haibing [Wed, 9 Aug 2023 14:27:18 +0000 (22:27 +0800)]
RDMA Remove unused function declarations
Commit
c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev")
declared but never implemented ib_device_netdev(), remove it.
Commit
922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") declared but never implemented
iw_cm_unbind_qp() and iw_cm_get_qp().
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230809142718.42316-1-yuehaibing@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Guoqing Jiang [Mon, 31 Jul 2023 09:21:06 +0000 (17:21 +0800)]
RDMA/cxgb4: Set sq_sig_type correctly
Replace '0' with IB_SIGNAL_REQ_WR given the sq_sig_type is either
IB_SIGNAL_ALL_WR or IB_SIGNAL_REQ_WR per the below.
enum ib_sig_type {
IB_SIGNAL_ALL_WR,
IB_SIGNAL_REQ_WR
};
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230731092106.10396-1-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Yue Haibing [Fri, 4 Aug 2023 13:04:18 +0000 (21:04 +0800)]
RDMA/hns: Remove unused declaration hns_roce_modify_srq()
Commit
c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
declared but never implemented this.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230804130418.41728-1-yuehaibing@huawei.com
Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Ivan Orlov [Fri, 4 Aug 2023 15:05:28 +0000 (17:05 +0200)]
RDMA: Make all 'class' structures const
Now that the driver core allows for struct class to be in read-only
memory, making all 'class' structures to be declared at build time
placing them into read-only memory, instead of having to be dynamically
allocated at load time.
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
Cc: Jack Wang <jinpu.wang@ionos.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yishai Hadas <yishaih@nvidia.com>
Cc: Ivan Orlov <ivan.orlov0322@gmail.com>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/2023080427-commuting-crewless-cbee@gregkh
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Ruan Jinjie [Fri, 4 Aug 2023 08:21:01 +0000 (16:21 +0800)]
RDMA: Remove unnecessary NULL values
The NULL initialization of the pointers assigned by kzalloc() first is
not necessary, because if the kzalloc() failed, the pointers will be
assigned NULL, otherwise it works as usual. so remove it.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230804082102.3361961-1-ruanjinjie@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Xiang Yang [Fri, 4 Aug 2023 02:25:25 +0000 (10:25 +0800)]
IB/uverbs: Fix an potential error pointer dereference
smatch reports the warning below:
drivers/infiniband/core/uverbs_std_types_counters.c:110
ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ() error: 'uattr'
dereferencing possible ERR_PTR()
The return value of uattr maybe ERR_PTR(-ENOENT), fix this by checking
the value of uattr before using it.
Fixes: ebb6796bd397 ("IB/uverbs: Add read counters support")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Link: https://lore.kernel.org/r/20230804022525.1916766-1-xiangyang3@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chengchang Tang [Fri, 4 Aug 2023 01:27:11 +0000 (09:27 +0800)]
RDMA/hns: Fix CQ and QP cache affinity
Currently, the affinity between QP cache and CQ cache is not
considered when assigning QPN, it will affect the message rate of HW.
Allocate QPN from QP cache with better CQ affinity to get better
performance.
Fixes: 71586dd20010 ("RDMA/hns: Create QP with selected QPN for bank load balance")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Junxian Huang [Fri, 4 Aug 2023 01:27:10 +0000 (09:27 +0800)]
RDMA/hns: Fix inaccurate error label name in init instance
This patch fixes inaccurate error label name in init instance.
Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Junxian Huang [Fri, 4 Aug 2023 01:27:09 +0000 (09:27 +0800)]
RDMA/hns: Fix incorrect post-send with direct wqe of wr-list
Currently, direct wqe is not supported for wr-list. RoCE driver excludes
direct wqe for wr-list by judging whether the number of wr is 1.
For a wr-list where the second wr is a length-error atomic wr, the
post-send driver handles the first wr and adds 1 to the wr number counter
firstly. While handling the second wr, the driver finds out a length error
and terminates the wr handle process, remaining the counter at 1. This
causes the driver mistakenly judges there is only 1 wr and thus enters
the direct wqe process, carrying the current length-error atomic wqe.
This patch fixes the error by adding a judgement whether the current wr
is a bad wr. If so, use the normal doorbell process but not direct wqe
despite the wr number is 1.
Fixes: 01584a5edcc4 ("RDMA/hns: Add support of direct wqe")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chengchang Tang [Fri, 4 Aug 2023 01:27:08 +0000 (09:27 +0800)]
RDMA/hns: Fix port active speed
HW supports a variety of different speed, but the current speed
is fixed.
The real speed should be querried from ethernet.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kalesh AP [Thu, 3 Aug 2023 08:45:26 +0000 (01:45 -0700)]
RDMA/bnxt_re: Remove unnecessary variable initializations
Remove unnecessary variable initializations.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kalesh AP [Thu, 3 Aug 2023 08:45:25 +0000 (01:45 -0700)]
RDMA/bnxt_re: Avoid unnecessary memset
Avoid memset by initializing the variables during
declaration itself.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kalesh AP [Thu, 3 Aug 2023 08:45:24 +0000 (01:45 -0700)]
RDMA/bnxt_re: Cleanup bnxt_re_process_raw_qp_pkt_rx() function
- Remove unnecessary memset by initializing the variables
during declaration itself.
- Arranged variable declarartion in RCT order.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Selvin Xavier [Thu, 3 Aug 2023 08:45:23 +0000 (01:45 -0700)]
RDMA/bnxt_re: Fix the sideband buffer size handling for FW commands
bnxt_qplib_rcfw_alloc_sbuf allocates 24 bytes and it is better to fit
on stack variables. This way we can avoid unwanted kmalloc call.
Call dma_alloc_coherent directly instead of wrapper
bnxt_qplib_rcfw_alloc_sbuf.
Also, FW expects the side buffer needs to be aligned to
BNXT_QPLIB_CMDQE_UNITS(16B). So align the size to have the
extra padding bytes.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kalesh AP [Thu, 3 Aug 2023 08:45:22 +0000 (01:45 -0700)]
RDMA/bnxt_re: Remove a redundant flag
After the cited commit, BNXT_RE_FLAG_GOT_MSIX is redundant.
Remove it.
Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kalesh AP [Thu, 3 Aug 2023 08:45:21 +0000 (01:45 -0700)]
RDMA/bnxt_re: Fix max_qp count for virtual functions
Driver has not accounted QP1 for virtual functions
when fetching device attributes and hence max_qp
count is one less than active_qp count. Fixed driver
so that it counts QP1 for virtual functions as well
while fetching device attributes
Fixes: ccd9d0d3dffc ("RDMA/bnxt_re: Enable RoCE on virtual functions")
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Gustavo A. R. Silva [Wed, 2 Aug 2023 14:46:26 +0000 (08:46 -0600)]
RDMA/irdma: Replace one-element array with flexible-array member
One-element and zero-length arrays are deprecated. So, replace
one-element array in struct irdma_qvlist_info with flexible-array
member.
A patch for this was sent a while ago[1]. However, it seems that, at
the time, the changes were partially folded[2][3], and the actual
flexible-array transformation was omitted. This patch fixes that.
The only binary difference seen before/after changes is shown below:
| drivers/infiniband/hw/irdma/hw.o
| @@ -868,7 +868,7 @@
| drivers/infiniband/hw/irdma/hw.c:484 (discriminator 2)
| size += struct_size(iw_qvlist, qv_info, rf->msix_count);
| 55b: imul $0x45c,%rdi,%rdi
|- 562: add $0x10,%rdi
|+ 562: add $0x4,%rdi
which is, of course, expected as it reflects the mistake made
while folding the patch I've mentioned above.
Worth mentioning is the fact that with this change we save 12 bytes
of memory, as can be inferred from the diff snapshot above. Notice
that:
$ pahole -C rdma_qv_info idrivers/infiniband/hw/irdma/hw.o
struct irdma_qv_info {
u32 v_idx; /* 0 4 */
u16 ceq_idx; /* 4 2 */
u16 aeq_idx; /* 6 2 */
u8 itr_idx; /* 8 1 */
/* size: 12, cachelines: 1, members: 4 */
/* padding: 3 */
/* last cacheline: 12 bytes */
};
Link: https://lore.kernel.org/linux-hardening/20210525230038.GA175516@embeddedor/
Link: https://lore.kernel.org/linux-hardening/bf46b428deef4e9e89b0ea1704b1f0e5@intel.com/
Link: https://lore.kernel.org/linux-rdma/20210520143809.819-1-shiraz.saleem@intel.com/T/#u
Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZMpsQrZadBaJGkt4@work
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Yue Haibing [Mon, 31 Jul 2023 13:59:16 +0000 (21:59 +0800)]
RDMA/hns: Remove unused function declarations
commit
b16f8188472e ("RDMA/hns: Refactor eq code for hip06") left behind
hns_roce_cleanup_eq_table().
commit
773f841ab1ae ("RDMA/hns: Avoid filling sgid index when modifying QP
to RTR") leave hns_get_gid_index() unused.
Remove both.
Link: https://lore.kernel.org/r/20230731135916.32392-1-yuehaibing@huawei.com
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 21 Jul 2023 20:07:49 +0000 (15:07 -0500)]
RDMA/rxe: Fix incomplete state save in rxe_requester
If a send packet is dropped by the IP layer in rxe_requester()
the call to rxe_xmit_packet() can fail with err == -EAGAIN.
To recover, the state of the wqe is restored to the state before
the packet was sent so it can be resent. However, the routines
that save and restore the state miss a significnt part of the
variable state in the wqe, the dma struct which is used to process
through the sge table. And, the state is not saved before the packet
is built which modifies the dma struct.
Under heavy stress testing with many QPs on a fast node sending
large messages to a slow node dropped packets are observed and
the resent packets are corrupted because the dma struct was not
restored. This patch fixes this behavior and allows the test cases
to succeed.
Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer")
Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Tue, 20 Jun 2023 14:01:43 +0000 (09:01 -0500)]
RDMA/rxe: Fix rxe_modify_srq
This patch corrects an error in rxe_modify_srq where if the
caller changes the srq size the actual new value is not returned
to the caller since it may be larger than what is requested.
Additionally it open codes the subroutine rcv_wqe_size() which
adds very little value, and makes some whitespace changes.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230620140142.9452-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Tue, 20 Jun 2023 13:55:21 +0000 (08:55 -0500)]
RDMA/rxe: Fix unsafe drain work queue code
If create_qp does not fully succeed it is possible for qp cleanup
code to attempt to drain the send or recv work queues before the
queues have been created causing a seg fault. This patch checks
to see if the queues exist before attempting to drain them.
Link: https://lore.kernel.org/r/20230620135519.9365-3-rpearsonhpe@gmail.com
Reported-by: syzbot+2da1965168e7dbcba136@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-rdma/00000000000012d89205fe7cfe00@google.com/raw
Fixes: 49dc9c1f0c7e ("RDMA/rxe: Cleanup reset state handling in rxe_resp.c")
Fixes: fbdeb828a21f ("RDMA/rxe: Cleanup error state handling in rxe_comp.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Tue, 20 Jun 2023 13:55:19 +0000 (08:55 -0500)]
RDMA/rxe: Move work queue code to subroutines
This patch:
- Moves code to initialize a qp send work queue to a
subroutine named rxe_init_sq.
- Moves code to initialize a qp recv work queue to a
subroutine named rxe_init_rq.
- Moves initialization of qp request and response packet
queues ahead of work queue initialization so that cleanup
of a qp if it is not fully completed can successfully
attempt to drain the packet queues without a seg fault.
- Makes minor whitespace cleanups.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230620135519.9365-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Ruan Jinjie [Mon, 31 Jul 2023 08:51:18 +0000 (16:51 +0800)]
RDMA: Remove unnecessary ternary operators
There are a little ternary operators, the true or false judgment
of which is unnecessary in C language semantics.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230731085118.394443-1-ruanjinjie@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Shetu Ayalew [Sun, 23 Jul 2023 14:21:14 +0000 (17:21 +0300)]
IB/mlx5: Add HW counter called rx_dct_connect
The rx_dct_connect counter shows the number of received connection
requests for the associated DCTs.
Signed-off-by: Shetu Ayalew <shetu@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/01cd24cd7f591734741309921fdc01fc770d84a8.1690121941.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Ruan Jinjie [Mon, 31 Jul 2023 06:55:43 +0000 (14:55 +0800)]
RDMA/mthca: Remove unnecessary NULL assignments
There are many pointers assigned first, which need not to be initialized, so
remove the NULL assignments.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230731065543.2285928-1-ruanjinjie@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Yang Li [Mon, 31 Jul 2023 01:59:15 +0000 (09:59 +0800)]
RDMA/irdma: Fix one kernel-doc comment
Remove description of @free_hwcqp in irdma_destroy_cqp().
to silence the warning:
drivers/infiniband/hw/irdma/hw.c:580: warning: Excess function parameter 'free_hwcqp' description in 'irdma_destroy_cqp'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6028
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230731015915.34867-1-yang.lee@linux.alibaba.com
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Bernard Metzler [Fri, 28 Jul 2023 11:44:18 +0000 (13:44 +0200)]
RDMA/siw: Fix tx thread initialization.
Immediately removing the siw module after insertion may
crash in siw_stop_tx_thread(), if the according thread did
not yet had a chance to initialize its wait queue and
siw_stop_tx_thread() tries to wakeup that thread. Initializing
the threads state before spwaning it fixes it.
Reported-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20230728114418.124328-1-bmt@zurich.ibm.com
Tested-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Ruan Jinjie [Fri, 28 Jul 2023 06:51:39 +0000 (14:51 +0800)]
RDMA/mlx: Remove unnecessary variable initializations
Remove unnecessary variable initializations.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230728065139.3411703-1-ruanjinjie@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Sindhu Devale [Tue, 25 Jul 2023 15:55:25 +0000 (10:55 -0500)]
RDMA/irdma: Use HW specific minimum WQ size
HW GEN1 and GEN2 have different min WQ sizes but they are
currently set to the same value.
Use a gen specific attribute min_hw_wq_size and extend ABI to
pass it to user-space.
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230725155525.1081-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Sindhu Devale [Tue, 25 Jul 2023 15:55:24 +0000 (10:55 -0500)]
RDMA/irdma: Allow accurate reporting on QP max send/recv WR
Currently the attribute cap.max_send_wr and cap.max_recv_wr
sent from user-space during create QP are the provider computed
SQ/RQ depth as opposed to raw values passed from application.
This inhibits computation of an accurate value for max_send_wr
and max_recv_wr for this QP in the kernel which matches the value
returned in user create QP. Also these capabilities needs to be
reported from the driver in query QP.
Add support by extending the ABI to allow the raw cap.max_send_wr and
cap.max_recv_wr to be passed from user-space, while keeping compatibility
for the older scheme.
The internal HW depth and shift needed for the WQs needs to be computed
now for both kernel and user-mode QPs. Add new helpers to assist with this:
irdma_uk_calc_depth_shift_sq, irdma_uk_calc_depth_shift_rq and
irdma_uk_calc_depth_shift_wq.
Consolidate all the user mode QP setup into a new function
irdma_setup_umode_qp which keeps it with its counterpart
irdma_setup_kmode_qp.
Signed-off-by: Youvaraj Sagar <youvaraj.sagar@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230725155525.1081-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Haoyue Xu [Fri, 21 Jul 2023 09:20:52 +0000 (17:20 +0800)]
RDMA/core: Get IB width and speed from netdev
Previously, there was no way to query the number of lanes for a network
card, so the same netdev_speed would result in a fixed pair of width and
speed. As network card specifications become more diverse, such fixed
mode is no longer suitable, so a method is needed to obtain the correct
width and speed based on the number of lanes.
This patch retrieves netdev lanes and speed from net_device and
translates them to IB width and speed.
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Luoyouming <luoyouming@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230721092052.2090449-1-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chandramohan Akula [Wed, 26 Jul 2023 14:51:21 +0000 (07:51 -0700)]
bnxt_re: Update the debug counters for doorbell pacing
Add debug counters to track the Doorbell pacing events and report the
doorbell pacing debug stats.
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1690383081-15033-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chandramohan Akula [Wed, 26 Jul 2023 14:51:20 +0000 (07:51 -0700)]
bnxt_re: Expose the missing hw counters
Add code to expose some of the HW counters related
to tx/rx data and Congestion control.
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1690383081-15033-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chandramohan Akula [Wed, 26 Jul 2023 14:51:19 +0000 (07:51 -0700)]
bnxt_re: Update the hw counters for resource stats
Report the additional resource counters which enables
better debugging. Includes active RC/UD QPs,
Watermark of the resources and a count that indicates the
resize cq operations after driver load.
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1690383081-15033-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chandramohan Akula [Wed, 26 Jul 2023 14:51:18 +0000 (07:51 -0700)]
bnxt_re: Reorganize the resource stats
Move the resource stats to a separate stats structure.
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1690383081-15033-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mustafa Ismail [Tue, 25 Jul 2023 15:55:05 +0000 (10:55 -0500)]
RDMA/irdma: Cleanup and rename irdma_netdev_vlan_ipv6()
The return value from irdma_netdev_vlan_ipv6() is not used. Rename
the functions and change to a void return.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230725155505.1069-5-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Krzysztof Czurylo [Tue, 25 Jul 2023 15:55:04 +0000 (10:55 -0500)]
RDMA/irdma: Add table based lookup for CQ pointer during an event
Add a CQ table based loookup to allow quick search
for CQ pointer having CQ ID in case of CQ related
asynchrononous event. The table is implemented in a
similar fashion to QP table.
Also add a reference counters for CQ. This is to prevent
destroying CQ while an asynchronous event is being processed.
The memory resource table size is sized higher with this update,
and this table doesn't need to be physically contiguous, so use
a vzalloc vs kzalloc to allocate the table.
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230725155505.1069-4-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Sindhu Devale [Tue, 25 Jul 2023 15:55:03 +0000 (10:55 -0500)]
RDMA/irdma: Refactor error handling in create CQP
In case of a failure in irdma_create_cqp, do not call
irdma_destroy_cqp, but cleanup all the allocated resources
in reverse order.
Drop the extra argument in irdma_destroy_cqp as its no longer needed.
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230725155505.1069-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Sindhu Devale [Tue, 25 Jul 2023 15:55:02 +0000 (10:55 -0500)]
RDMA/irdma: Drop a local in irdma_sc_get_next_aeqe
Drop the local wqe_idx in irdma_sc_get_next_aeqe and instead
store the wqe_idx in the info structure for all asynchronous events(AE)
received. There is no reason it should be tied to a specific AE source.
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230725155505.1069-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Christophe JAILLET [Sat, 22 Jul 2023 16:47:24 +0000 (18:47 +0200)]
IB/hfi1: Use struct_size()
Use struct_size() instead of hand-writing it, when allocating a structure
with a flex array.
This is less verbose, more robust and more informative.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/f4618a67d5ae0a30eb3f2b4558c8cc790feed79a.1690044376.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Junxian Huang [Fri, 21 Jul 2023 02:51:46 +0000 (10:51 +0800)]
RDMA/hns: Remove VF extend configuration
Remove VF extend configuration since the relative registers are
configured in firmware currently.
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230721025146.450831-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Luoyouming [Fri, 21 Jul 2023 02:51:45 +0000 (10:51 +0800)]
RDMA/hns: Support get XRCD number from firmware
Support driver get the num of XRCD from firmware.
Signed-off-by: Luoyouming <luoyouming@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230721025146.450831-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Minjie Du [Wed, 5 Jul 2023 03:18:49 +0000 (11:18 +0800)]
RDMA/qedr: Remove duplicate assignments of va
Avoid double assignment of iwqp->ietf_mem.va.
Signed-off-by: Minjie Du <duminjie@vivo.com>
Link: https://lore.kernel.org/r/20230705031849.2443-1-duminjie@vivo.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Minjie Du [Wed, 5 Jul 2023 10:39:50 +0000 (18:39 +0800)]
RDMA/qedr: Remove a duplicate assignment in qedr_create_gsi_qp()
Delete a duplicate statement from this function implementation.
Signed-off-by: Minjie Du <duminjie@vivo.com>
Link: https://lore.kernel.org/r/20230705103950.15225-1-duminjie@vivo.com
Acked-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:57 +0000 (22:02 -0700)]
RDMA/bnxt_re: Add a new uapi for driver notification
Add driver notify uapi for application notifying
the driver about the doorbell FIFO congestion.
Link: https://lore.kernel.org/r/1689742977-9128-8-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:56 +0000 (22:02 -0700)]
RDMA/bnxt_re: Implement doorbell pacing algorithm
User applications alert the driver when the Doorbell FIFO
reaches the alarm threshold. The driver updates the pacing
parameters in the shared page to do the maximum pacing
by the application till the DB FIFO congestion reduces to
pacing threshold. Driver keeps checking the DB FIFO depth
at the pacing interval and gradually adjusts the pacing level.
Once the pacing level reaches default values (no congestion in
the FIFO) pacing gets completed.
Link: https://lore.kernel.org/r/1689742977-9128-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:55 +0000 (22:02 -0700)]
RDMA/bnxt_re: Update alloc_page uapi for pacing
Update the alloc_page uapi functionality for handling the
mapping of doorbell pacing shared page and bar address.
Link: https://lore.kernel.org/r/1689742977-9128-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:54 +0000 (22:02 -0700)]
RDMA/bnxt_re: Enable pacing support for the user apps
Report the pacing capability to the user applications.
Link: https://lore.kernel.org/r/1689742977-9128-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:53 +0000 (22:02 -0700)]
RDMA/bnxt_re: Initialize Doorbell pacing feature
Checks for pacing feature capability and get the doorbell pacing
configuration using FW commands. Allocate a page and initialize
the pacing parameters for the applications. Cleanup the page and
de-initialize the pacing during device removal.
Link: https://lore.kernel.org/r/1689742977-9128-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:52 +0000 (22:02 -0700)]
bnxt_en: Share the bar0 address with the RoCE driver
Add a parameter in the bnxt_en_dev structure to share the bar0 address
with RoCE driver.
Link: https://lore.kernel.org/r/1689742977-9128-3-git-send-email-selvin.xavier@broadcom.com
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chandramohan Akula [Wed, 19 Jul 2023 05:02:51 +0000 (22:02 -0700)]
bnxt_en: Update HW interface headers
Updating the HW structures for the doorbell pacing related
information. Newly added interface structures will be used in
the followup patches.
Link: https://lore.kernel.org/r/1689742977-9128-2-git-send-email-selvin.xavier@broadcom.com
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chuck Lever [Mon, 17 Jul 2023 15:12:32 +0000 (11:12 -0400)]
RDMA/cma: Avoid GID lookups on iWARP devices
We would like to enable the use of siw on top of a VPN that is
constructed and managed via a tun device. That hasn't worked up
until now because ARPHRD_NONE devices (such as tun devices) have
no GID for the RDMA/core to look up.
But it turns out that the egress device has already been picked for
us -- no GID is necessary. addr_handler() just has to do the right
thing with it.
Link: https://lore.kernel.org/r/168960675257.3007.4737911174148394395.stgit@manet.1015granger.net
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chuck Lever [Mon, 17 Jul 2023 15:12:25 +0000 (11:12 -0400)]
RDMA/cma: Deduplicate error flow in cma_validate_port()
Clean up to prepare for the addition of new logic.
Link: https://lore.kernel.org/r/168960674597.3007.6128252077812202526.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chuck Lever [Mon, 17 Jul 2023 15:12:19 +0000 (11:12 -0400)]
RDMA/core: Set gid_attr.ndev for iWARP devices
Have the iwarp side properly set the ndev in the device's sgid_attrs
so that address resolution can treat it more like a RoCE device.
Link: https://lore.kernel.org/r/168960673933.3007.8043081822081877578.stgit@manet.1015granger.net
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Chuck Lever [Mon, 17 Jul 2023 15:12:12 +0000 (11:12 -0400)]
RDMA/siw: Fabricate a GID on tun and loopback devices
LOOPBACK and NONE (tunnel) devices have all-zero MAC addresses.
Currently, siw_device_create() falls back to copying the IB device's
name in those cases, because an all-zero MAC address breaks the RDMA
core address resolution mechanism.
However, at the point when siw_device_create() constructs a GID, the
ib_device::name field is uninitialized, leaving the MAC address to
remain in an all-zero state.
Fabricate a random artificial GID for such devices, and ensure this
artificial GID is returned for all device query operations.
Link: https://lore.kernel.org/r/168960673260.3007.12378736853793339110.stgit@manet.1015granger.net
Reported-by: Tom Talpey <tom@talpey.com>
Fixes: a2d36b02c15d ("RDMA/siw: Enable siw on tunnel devices")
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Julia Lawall [Tue, 27 Jun 2023 14:43:34 +0000 (16:43 +0200)]
RDMA/bnxt_re: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Link: https://lore.kernel.org/r/20230627144339.144478-20-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Julia Lawall [Tue, 27 Jun 2023 14:43:29 +0000 (16:43 +0200)]
RDMA/siw: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Link: https://lore.kernel.org/r/20230627144339.144478-15-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Julia Lawall [Tue, 27 Jun 2023 14:43:20 +0000 (16:43 +0200)]
RDMA/erdma: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Link: https://lore.kernel.org/r/20230627144339.144478-6-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Arnd Bergmann [Tue, 18 Jul 2023 19:38:09 +0000 (21:38 +0200)]
RDMA/irdma: Fix building without IPv6
The new irdma_iw_get_vlan_prio() function requires IPv6 support to build:
x86_64-linux-ld: drivers/infiniband/hw/irdma/cm.o: in function `irdma_iw_get_vlan_prio':
cm.c:(.text+0x2832): undefined reference to `ipv6_chk_addr'
Add a compile-time check in the same way as elsewhere in this file to avoid
this by conditionally leaving out the ipv6 specific bits.
Fixes: f877f22ac1e9b ("RDMA/irdma: Implement egress VLAN priority")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230718193835.3546684-1-arnd@kernel.org
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mustafa Ismail [Tue, 11 Jul 2023 17:53:18 +0000 (12:53 -0500)]
RDMA/irdma: Implement egress VLAN priority
When a VLAN interface is in use, get and use the VLAN
egress mapping.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230711175318.1301-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Minjie Du [Thu, 6 Jul 2023 02:27:03 +0000 (10:27 +0800)]
RDMA/qedr: Remove a duplicate assignment in irdma_query_ah()
Delete a duplicate statement from this function implementation.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Minjie Du <duminjie@vivo.com>
Acked-by: Alok Prasad <palok@marvell.com>
Link: https://lore.kernel.org/r/20230706022704.1260-1-duminjie@vivo.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Michael Margolin [Mon, 3 Jul 2023 15:34:04 +0000 (15:34 +0000)]
RDMA/efa: Add RDMA write HW statistics counters
Update device API and request RDMA write counters if RDMA write is
supported by device. Expose newly added counters through ib core
counters mechanism.
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Link: https://lore.kernel.org/r/20230703153404.30877-1-mrgolin@amazon.com
Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Yuanyuan Zhong [Thu, 29 Jun 2023 21:32:48 +0000 (15:32 -0600)]
RDMA/mlx5: align MR mem allocation size to power-of-two
The MR memory allocation requests extra bytes to guarantee that there
is enough space to find the memory aligned to MLX5_UMR_ALIGN.
For power-of-two sizes, the alignment can be guaranteed by kmalloc()
according to commit
59bb47985c1d ("mm, sl[aou]b: guarantee natural
alignment for kmalloc(power-of-two)").
So if target alignment is power-of-two and adding the extra bytes
crosses a power-of-two boundary, use the next power-of-two as the
allocation size.
Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
Link: https://lore.kernel.org/r/20230629213248.3184245-2-yzhong@purestorage.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Linus Torvalds [Sun, 9 Jul 2023 20:53:13 +0000 (13:53 -0700)]
Linux 6.5-rc1
Linus Torvalds [Sun, 9 Jul 2023 17:29:53 +0000 (10:29 -0700)]
MAINTAINERS 2: Electric Boogaloo
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats:
80e62bc8487b ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 9 Jul 2023 17:24:22 +0000 (10:24 -0700)]
Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Linus Torvalds [Sun, 9 Jul 2023 17:16:04 +0000 (10:16 -0700)]
Merge tag 'irq_urgent_for_v6.5_rc1' of git://git./linux/kernel/git/tip/tip
Pull irq update from Borislav Petkov:
- Optimize IRQ domain's name assignment
* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Use return value of strreplace()
Linus Torvalds [Sun, 9 Jul 2023 17:13:32 +0000 (10:13 -0700)]
Merge tag 'x86_urgent_for_v6.5_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
Linus Torvalds [Sun, 9 Jul 2023 17:08:38 +0000 (10:08 -0700)]
Merge tag 'x86-core-2023-07-09' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Linus Torvalds [Sun, 9 Jul 2023 17:02:49 +0000 (10:02 -0700)]
Merge tag 'mips_6.5_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Linus Torvalds [Sun, 9 Jul 2023 16:50:42 +0000 (09:50 -0700)]
Merge tag 'xfs-6.5-merge-6' of git://git./fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
Linus Torvalds [Sun, 9 Jul 2023 16:45:32 +0000 (09:45 -0700)]
Merge tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Linus Torvalds [Sun, 9 Jul 2023 16:35:51 +0000 (09:35 -0700)]
Merge tag 'ntb-6.5' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Fixes for pci_clean_master, error handling in driver inits, and
various other issues/bugs"
* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
ntb: hw: amd: Fix debugfs_create_dir error checking
ntb.rst: Fix copy and paste error
ntb_netdev: Fix module_init problem
ntb: intel: Remove redundant pci_clear_master
ntb: epf: Remove redundant pci_clear_master
ntb_hw_amd: Remove redundant pci_clear_master
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
MAINTAINERS: git://github -> https://github.com for jonmason
NTB: EPF: fix possible memory leak in pci_vntb_probe()
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
Hugh Dickins [Sat, 8 Jul 2023 23:04:00 +0000 (16:04 -0700)]
mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 8 Jul 2023 21:30:25 +0000 (14:30 -0700)]
Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:12 +0000 (12:12 -0700)]
fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:11 +0000 (12:12 -0700)]
mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>