Akihiko Odaki [Thu, 27 Jun 2024 06:07:56 +0000 (15:07 +0900)]
pcie_sriov: Release VFs failed to realize
Release VFs failed to realize just as we do in unregister_vfs().
Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-7-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 06:07:55 +0000 (15:07 +0900)]
pcie_sriov: Reuse SR-IOV VF device instances
Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-6-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 06:07:54 +0000 (15:07 +0900)]
pcie_sriov: Ensure VF function number does not overflow
pci_new() aborts when creating a VF with a function number equals to or
is greater than PCI_DEVFN_MAX.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-5-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 06:07:53 +0000 (15:07 +0900)]
pcie_sriov: Do not manually unrealize
A device gets automatically unrealized when being unparented.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-4-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 06:07:52 +0000 (15:07 +0900)]
hw/ppc/spapr_pci: Do not reject VFs created after a PF
A PF may automatically create VFs and the PF may be function 0.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-3-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 06:07:51 +0000 (15:07 +0900)]
hw/ppc/spapr_pci: Do not create DT for disabled PCI device
Disabled means it is a disabled SR-IOV VF or it is powered off, and
hidden from the guest.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-2-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 06:07:50 +0000 (15:07 +0900)]
hw/pci: Rename has_power to enabled
The renamed state will not only represent powering state of PFs, but
also represent SR-IOV VF enablement in the future.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-reuse-v10-1-
7ca0b8ed3d9f@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cédric Le Goater [Mon, 1 Jul 2024 10:14:53 +0000 (12:14 +0200)]
virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged
When a VFIO device is hoplugged in a VM using virtio-iommu, IOMMUPciBus
and IOMMUDevice cache entries are created in the .get_address_space()
handler of the machine IOMMU device. However, these entries are never
destroyed, not even when the VFIO device is detached from the machine.
This can lead to an assert if the device is reattached again.
When reattached, the .get_address_space() handler reuses an
IOMMUDevice entry allocated when the VFIO device was first attached.
virtio_iommu_set_host_iova_ranges() is called later on from the
.set_iommu_device() handler an fails with an assert on 'probe_done'
because the device appears to have been already probed when this is
not the case.
The IOMMUDevice entry is allocated in pci_device_iommu_address_space()
called from under vfio_realize(), the VFIO PCI realize handler. Since
pci_device_unset_iommu_device() is called from vfio_exitfn(), a sub
function of the PCIDevice unrealize() handler, it seems that the
.unset_iommu_device() handler is the best place to release resources
allocated at realize time. Clear the IOMMUDevice cache entry there to
fix hotplug.
Fixes: 817ef10da23c ("virtio-iommu: Implement set|unset]_iommu_device() callbacks")
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <
20240701101453.203985-1-clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Mon, 1 Jul 2024 07:52:08 +0000 (09:52 +0200)]
virtio: remove virtio_tswap16s() call in vring_packed_event_read()
Commit
d152cdd6f6 ("virtio: use virtio accessor to access packed event")
switched using of address_space_read_cached() to virito_lduw_phys_cached()
to access packed descriptor event.
When we used address_space_read_cached(), we needed to call
virtio_tswap16s() to handle the endianess of the field, but
virito_lduw_phys_cached() already handles it internally, so we no longer
need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
but forgot for `flags`).
Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
Cc: jasowang@redhat.com
Cc: qemu-stable@nongnu.org
Reported-by: Xoykie <xoykie@gmail.com>
Link: https://lore.kernel.org/qemu-devel/CAFU8RB_pjr77zMLsM0Unf9xPNxfr_--Tjr49F_eX32ZBc5o2zQ@mail.gmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240701075208.19634-1-sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonathan Cameron [Tue, 25 Jun 2024 17:08:05 +0000 (18:08 +0100)]
hw/cxl/events: Mark cxl-add-dynamic-capacity and cxl-release-dynamic-capcity unstable
Markus suggested that we make the unstable. I don't expect these
interfaces to change because of their tight coupling to the Compute
Express Link (CXL) Specification, Revision 3.1 Fabric Management API
definitions which can only be extended in backwards compatible way.
However, there seems little disadvantage in taking a cautious path
for now and marking them as unstable interfaces.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <
20240625170805.359278-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonathan Cameron [Tue, 25 Jun 2024 17:08:04 +0000 (18:08 +0100)]
hw/cxl/events: Improve QMP interfaces and documentation for add/release dynamic capacity.
New DCD command definitions updated in response to review comments
from Markus.
- Used CxlXXXX instead of CXLXXXXX for newly added types.
- Expanded some abreviations in type names to be easier to read.
- Additional documentation for some fields.
- Replace slightly vague cxl r3.1 references with
"Compute Express Link (CXL) Specification, Revision 3.1, XXXX"
to bring them inline with what it says on the specification cover.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <
20240625170805.359278-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:36 +0000 (20:38 +0530)]
tests/data/acpi/rebuild-expected-aml.sh: Add RISC-V
Update the list of supported architectures to include RISC-V.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-14-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:35 +0000 (20:38 +0530)]
pc-bios/meson.build: Add support for RISC-V in unpack_edk2_blobs
Update list of images supported in unpack_edk2_blobs to enable RISC-V
ACPI table testing.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-13-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:34 +0000 (20:38 +0530)]
meson.build: Add RISC-V to the edk2-target list
so that ACPI table test can be supported.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-12-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:33 +0000 (20:38 +0530)]
tests/data/acpi/virt: Move ARM64 ACPI tables under aarch64/${machine} path
Same machine name can be used by different architectures. Hence, create
aarch64 folder and move all aarch64 related AML files for virt machine
inside.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-11-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:32 +0000 (20:38 +0530)]
tests/data/acpi: Move x86 ACPI tables under x86/${machine} path
To support multiple architectures using same machine name, create x86
folder and move all x86 related AML files for each machine type inside.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-10-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:31 +0000 (20:38 +0530)]
tests/qtest/bios-tables-test.c: Set "arch" for x86 tests
To search for expected AML files under ${arch}/${machine} path, set this
field for X86 related test cases.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-9-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:30 +0000 (20:38 +0530)]
tests/qtest/bios-tables-test.c: Set "arch" for aarch64 tests
To search for expected AML files under ${arch}/${machine} path, set this
field for AARCH64 related test cases.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-8-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:29 +0000 (20:38 +0530)]
tests/qtest/bios-tables-test.c: Add support for arch in path
Since machine name can be common for multiple architectures (ex: virt),
add "arch" in the path to search for expected AML files. Since the AML
files are still under old path, add support for searching with and
without arch in the path.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-7-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:28 +0000 (20:38 +0530)]
qtest: bios-tables-test: Rename aarch64 tests with aarch64 in them
Existing AARCH64 virt test functions do not have AARCH64 in their name.
To add RISC-V virt related test cases, better to rename existing
functions to indicate they are ARM only.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-6-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:27 +0000 (20:38 +0530)]
tests/data/uefi-boot-images: Add RISC-V ISO image
To test ACPI tables, edk2 needs to be booted with a disk image having
EFI partition. This image is created using UefiTestToolsPkg.
The image is generated using tests/uefi-test-tools source.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Message-Id: <
20240625150839.
1358279-5-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:26 +0000 (20:38 +0530)]
uefi-test-tools: Add support for python based build script
edk2-funcs.sh which is used in this Makefile, was removed in the commit
c28a2891f3 ("edk2: update build script"). It is replaced with a python
based script. So, update the Makefile and add the configuration file as
required to support the python based build script.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-4-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sunil V L [Tue, 25 Jun 2024 15:08:25 +0000 (20:38 +0530)]
uefi-test-tools/UefiTestToolsPkg: Add RISC-V support
Enable building the test application for RISC-V with appropriate
dependencies updated.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <
20240625150839.
1358279-3-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Nicolin Chen [Wed, 19 Jun 2024 20:12:43 +0000 (13:12 -0700)]
hw/arm/virt-acpi-build: Fix id_count in build_iort_id_mapping
It's observed that Linux kernel booting with the VM reports a "conflicting
mapping for input ID" FW_BUG.
The IORT doc defines "Number of IDs" to be "the number of IDs in the range
minus one", while virt-acpi-build.c simply stores the number of IDs in the
id_count without the "minus one". Meanwhile, some of the callers pass in a
0xFFFF following the spec. So, this is a mismatch between the function and
its callers.
Fix build_iort_id_mapping() by internally subtracting one from the pass-in
@id_count. Accordingly make sure that all existing callers pass in a value
without the "minus one", i.e. change all 0xFFFFs to 0x10000s.
Also, add a few lines of comments to highlight this change along with the
referencing document for this build_iort_id_mapping().
Fixes: 42e0f050e3a5 ("hw/arm/virt-acpi-build: Add IORT support to bypass SMMUv3")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Message-Id: <
20240619201243.936819-1-nicolinc@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Woodhouse [Wed, 19 Jun 2024 13:03:08 +0000 (14:03 +0100)]
hw/i386/fw_cfg: Add etc/e820 to fw_cfg late
In e820_add_entry() the e820_table is reallocated with g_renew() to make
space for a new entry. However, fw_cfg_arch_create() just uses the
existing e820_table pointer. This leads to a use-after-free if anything
adds a new entry after fw_cfg is set up.
Shift the addition of the etc/e820 file to the machine done notifier, via
a new fw_cfg_add_e820() function.
Also make e820_table private and use an e820_get_table() accessor function
for it, which sets a flag that will trigger an assert() for any *later*
attempts to add to the table.
Make e820_add_entry() return void, as most callers don't check for error
anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
a2708734f004b224f33d3b4824e9a5a262431568.camel@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Nicolin Chen [Wed, 19 Jun 2024 00:17:08 +0000 (17:17 -0700)]
hw/arm/virt-acpi-build: Drop local iort_node_offset
Both the other two callers of build_iort_id_mapping() just directly pass
in the IORT_NODE_OFFSET macro. Keeping a "const uint32_t" local variable
storing the same value doesn't have any gain.
Simplify this by replacing the only place using this local variable with
the macro directly.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Message-Id: <
20240619001708.926511-1-nicolinc@nvidia.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Huth [Tue, 18 Jun 2024 12:19:58 +0000 (14:19 +0200)]
hw/virtio: Fix the de-initialization of vhost-user devices
The unrealize functions of the various vhost-user devices are
calling the corresponding vhost_*_set_status() functions with a
status of 0 to shut down the device correctly.
Now these vhost_*_set_status() functions all follow this scheme:
bool should_start = virtio_device_should_start(vdev, status);
if (vhost_dev_is_started(&vvc->vhost_dev) == should_start) {
return;
}
if (should_start) {
/* ... do the initialization stuff ... */
} else {
/* ... do the cleanup stuff ... */
}
The problem here is virtio_device_should_start(vdev, 0) currently
always returns "true" since it internally only looks at vdev->started
instead of looking at the "status" parameter. Thus once the device
got started once, virtio_device_should_start() always returns true
and thus the vhost_*_set_status() functions return early, without
ever doing any clean-up when being called with status == 0. This
causes e.g. problems when trying to hot-plug and hot-unplug a vhost
user devices multiple times since the de-initialization step is
completely skipped during the unplug operation.
This bug has been introduced in commit
9f6bcfd99f ("hw/virtio: move
vm_running check to virtio_device_started") which replaced
should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
with
should_start = virtio_device_started(vdev, status);
which later got replaced by virtio_device_should_start(). This blocked
the possibility to set should_start to false in case the status flag
VIRTIO_CONFIG_S_DRIVER_OK was not set.
Fix it by adjusting the virtio_device_should_start() function to
only consider the status flag instead of vdev->started. Since this
function is only used in the various vhost_*_set_status() functions
for exactly the same purpose, it should be fine to fix it in this
central place there without any risk to change the behavior of other
code.
Fixes: 9f6bcfd99f ("hw/virtio: move vm_running check to virtio_device_started")
Buglink: https://issues.redhat.com/browse/RHEL-40708
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <
20240618121958.88673-1-thuth@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:05:34 +0000 (12:05 +0200)]
tests/qtest/vhost-user-test: add a test case for memory-backend-shm
`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.
Acked-by: Thomas Huth <thuth@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100534.145917-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:05:27 +0000 (12:05 +0200)]
tests/qtest/vhost-user-blk-test: use memory-backend-shm
`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.
Since we are here, let`s remove `share=on` which is the default for shm
(and also for memfd).
Acked-by: Thomas Huth <thuth@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100527.145883-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:05:19 +0000 (12:05 +0200)]
hostmem: add a new memory backend based on POSIX shm_open()
shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).
The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.
This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.
In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100519.145853-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:04:47 +0000 (12:04 +0200)]
contrib/vhost-user-*: use QEMU bswap helper functions
Let's replace the calls to le*toh() and htole*() with qemu/bswap.h
helpers to make the code more portable.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100447.145697-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:04:39 +0000 (12:04 +0200)]
contrib/vhost-user-blk: fix bind() using the right size of the address
On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.
This sounds like one of the portability problems described in the
unix(7) manpage:
Pathname sockets
When binding a socket to a pathname, a few rules should
be observed for maximum portability and ease of coding:
• The pathname in sun_path should be null-terminated.
• The length of the pathname, including the terminating
null byte, should not exceed the size of sun_path.
• The addrlen argument that describes the enclosing
sockaddr_un structure should have a value of at least:
offsetof(struct sockaddr_un, sun_path) +
strlen(addr.sun_path)+1
or, more simply, addrlen can be specified as
sizeof(struct sockaddr_un).
So let's follow the last advice and simplify the code as well.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100440.145664-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:00:35 +0000 (12:00 +0200)]
vhost-user-server: do not set memory fd non-blocking
In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
it's not really needed, because we don't use these fd with blocking
operations, but only to map memory.
In addition, in some systems this operation can fail (e.g. in macOS
setting an fd returned by shm_open() non-blocking fails with errno
= ENOTTY).
So, let's avoid setting fd non-blocking for those messages that we
know carry memory fd (e.g. VHOST_USER_ADD_MEM_REG,
VHOST_USER_SET_MEM_TABLE).
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100043.144657-6-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:00:34 +0000 (12:00 +0200)]
libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.
VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100043.144657-5-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:00:33 +0000 (12:00 +0200)]
libvhost-user: fail vu_message_write() if sendmsg() is failing
In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.
If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.
Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100043.144657-4-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:00:32 +0000 (12:00 +0200)]
libvhost-user: set msg.msg_control to NULL when it is empty
On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100043.144657-3-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 18 Jun 2024 10:00:31 +0000 (12:00 +0200)]
qapi: clarify that the default is backend dependent
The default value of the @share option of the @MemoryBackendProperties
really depends on the backend type, so let's document the default
values in the same place where we define the option to avoid
dispersing the information.
Cc: David Hildenbrand <david@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240618100043.144657-2-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Dmitry Frolov [Thu, 13 Jun 2024 14:35:30 +0000 (17:35 +0300)]
hw/net/virtio-net.c: fix crash in iov_copy()
A crash found while fuzzing device virtio-net-socket-check-used.
Assertion "offset == 0" in iov_copy() fails if less than guest_hdr_len bytes
were transmited.
Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Message-Id: <
20240613143529.602591-2-frolov@swemel.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
BillXiang [Thu, 13 Jun 2024 06:51:50 +0000 (14:51 +0800)]
vhost-user: Skip unnecessary duplicated VHOST_USER_SET_LOG_BASE requests
The VHOST_USER_SET_LOG_BASE requests should be categorized into
non-vring specific messages, and should be sent only once.
If send more than once, dpdk will munmap old log_addr which may has been used and cause segmentation fault.
Signed-off-by: BillXiang <xiangwencheng@dayudpu.com>
Message-Id: <
20240613065150.3100-1-xiangwencheng@dayudpu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Manos Pitsidianakis [Thu, 13 Jun 2024 05:49:12 +0000 (08:49 +0300)]
virtio-iommu: add error check before assert
A fuzzer case discovered by Zheyu Ma causes an assert failure.
Add a check before the assert, and respond with an error before moving
on to the next queue element.
To reproduce the failure:
cat << EOF | \
qemu-system-x86_64 \
-display none -machine accel=qtest -m 512M -machine q35 -nodefaults \
-device virtio-iommu -qtest stdio
outl 0xcf8 0x80000804
outw 0xcfc 0x06
outl 0xcf8 0x80000820
outl 0xcfc 0xe0004000
write 0x10000e 0x1 0x01
write 0xe0004020 0x4 0x00001000
write 0xe0004028 0x4 0x00101000
write 0xe000401c 0x1 0x01
write 0x106000 0x1 0x05
write 0x100001 0x1 0x60
write 0x100002 0x1 0x10
write 0x100009 0x1 0x04
write 0x10000c 0x1 0x01
write 0x100018 0x1 0x04
write 0x10001c 0x1 0x02
write 0x101003 0x1 0x01
write 0xe0007001 0x1 0x00
EOF
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2359
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <
20240613-fuzz-2359-fix-v2-manos.pitsidianakis@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Akihiko Odaki [Thu, 27 Jun 2024 13:37:50 +0000 (22:37 +0900)]
hw/virtio: Free vqs after vhost_dev_cleanup()
This fixes LeakSanitizer warnings.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <
20240627-san-v2-7-
750bb0946dbd@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhao Liu [Thu, 6 Jun 2024 14:08:58 +0000 (22:08 +0800)]
i386/apic: Add hint on boot failure because of disabling x2APIC
Currently, the Q35 supports up to 4096 vCPUs (since v9.0), but for TCG
cases, if x2APIC is not actively enabled to boot more than 255 vCPUs (
e.g., qemu-system-i386 -M pc-q35-9.0 -smp 666), the following error is
reported:
Unexpected error in apic_common_set_id() at ../hw/intc/apic_common.c:449:
qemu-system-i386: APIC ID 255 requires x2APIC feature in CPU
Aborted (core dumped)
This error can be resolved by setting x2apic=on in -cpu. In order to
better help users deal with this scenario, add the error hint to
instruct users on how to enable the x2apic feature. Then, the error
report becomes the following:
Unexpected error in apic_common_set_id() at ../hw/intc/apic_common.c:448:
qemu-system-i386: APIC ID 255 requires x2APIC feature in CPU
Try x2apic=on in -cpu.
Aborted (core dumped)
Note since @errp is &error_abort, error_append_hint() can't be applied
on @errp. And in order to separate the exact error message from the
(perhaps effectively) hint, adding a hint via error_append_hint() is
also necessary. Therefore, introduce @local_error in
apic_common_set_id() to handle both the error message and the error
hint.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <
20240606140858.
2157106-1-zhao1.liu@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Yuxue Liu [Thu, 11 Apr 2024 07:35:55 +0000 (15:35 +0800)]
vhost-user-test: no set non-blocking for cal fd less than 0.
In the scenario where vhost-user sets eventfd to -1,
qemu_chr_fe_get_msgfds retrieves fd as -1. When vhost_user_read
receives, it does not perform blocking operations on the descriptor
with fd=-1, so non-blocking operations should not be performed here
either.This is a normal use case. Calling g_unix_set_fd_nonblocking
at this point will cause the test to interrupt.
When vhost_user_write sets the call fd to -1, it sets the number of
fds to 0, so the fds obtained by qemu_chr_fe_get_msgfds will also
be 0.
Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com>
Message-Id: <
20240411073555.1357-1-yuxue.liu@jaguarmicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jiqian Chen [Thu, 6 Jun 2024 10:22:05 +0000 (18:22 +0800)]
virtio-pci: implement No_Soft_Reset bit
In current code, when guest does S3, virtio-gpu are reset due to the
bit No_Soft_Reset is not set. After resetting, the display resources
of virtio-gpu are destroyed, then the display can't come back and only
show blank after resuming.
Implement No_Soft_Reset bit of PCI_PM_CTRL register, then guest can check
this bit, if this bit is set, the devices resetting will not be done, and
then the display can work after resuming.
No_Soft_Reset bit is implemented for all virtio devices, and was tested
only on virtio-gpu device. Set it false by default for safety.
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Message-Id: <
20240606102205.114671-3-Jiqian.Chen@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Ira Weiny [Fri, 31 May 2024 16:22:05 +0000 (11:22 -0500)]
hw/cxl: Fix read from bogus memory
Peter and coverity report:
We've passed '&data' to address_space_write(), which means "read
from the address on the stack where the function argument 'data'
lives", so instead of writing 64 bytes of data to the guest ,
we'll write 64 bytes which start with a host pointer value and
then continue with whatever happens to be on the host stack
after that.
Indeed the intention was to write 64 bytes of data at the address given.
Fix the parameter to address_space_write().
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/all/CAFEAcA-u4sytGwTKsb__Y+_+0O2-WwARntm3x8WNhvL1WfHOBg@mail.gmail.com/
Fixes: 6bda41a69bdc ("hw/cxl: Add clear poison mailbox command support.")
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Message-Id: <
20240531-fix-poison-set-cacheline-v1-1-
e3bc7e8f1158@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cindy Lu [Tue, 28 May 2024 08:48:15 +0000 (16:48 +0800)]
virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one()
In function kvm_virtio_pci_vector_use_one(), the function will only use
the irqfd/vector for itself. Therefore, in the undo label, the failing
process is incorrect.
To fix this, we can just remove this label.
Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
Cc: qemu-stable@nongnu.org
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <
20240528084840.194538-1-lulu@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Weißschuh [Mon, 27 May 2024 06:27:54 +0000 (08:27 +0200)]
Revert "docs/specs/pvpanic: mark shutdown event as not implemented"
The missing functionality has been implemented now.
This reverts commit
e739d1935c461d0668057e9dbba9d06f728d29ec.
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <
20240527-pvpanic-shutdown-v8-8-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Weißschuh [Mon, 27 May 2024 06:27:53 +0000 (08:27 +0200)]
tests/qtest/pvpanic: add tests for pvshutdown event
Validate that a shutdown via the pvpanic device emits the correct
QMP events.
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <
20240527-pvpanic-shutdown-v8-7-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Alejandro Jimenez [Mon, 27 May 2024 06:27:52 +0000 (08:27 +0200)]
pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal
Emit a QMP event on receiving a PVPANIC_SHUTDOWN event. Even though a typical
SHUTDOWN event will be sent, it will be indistinguishable from a shutdown
originating from other cases (e.g. KVM exit due to KVM_SYSTEM_EVENT_SHUTDOWN)
that also issue the guest-shutdown cause.
A management layer application can detect the new GUEST_PVSHUTDOWN event to
determine if the guest is using the pvpanic interface to request shutdowns.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Message-Id: <
20240527-pvpanic-shutdown-v8-6-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Weißschuh [Mon, 27 May 2024 06:27:51 +0000 (08:27 +0200)]
hw/misc/pvpanic: add support for normal shutdowns
Shutdown requests are normally hardware dependent.
By extending pvpanic to also handle shutdown requests, guests can
submit such requests with an easily implementable and cross-platform
mechanism.
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <
20240527-pvpanic-shutdown-v8-5-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Weißschuh [Mon, 27 May 2024 06:27:50 +0000 (08:27 +0200)]
tests/qtest/pvpanic: use centralized definition of supported events
Avoid the necessity to update all tests when new events are added
to the device.
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <
20240527-pvpanic-shutdown-v8-4-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Weißschuh [Mon, 27 May 2024 06:27:49 +0000 (08:27 +0200)]
hw/misc/pvpanic: centralize definition of supported events
The different components of pvpanic duplicate the list of supported
events. Move it to the shared header file to minimize changes when new
events are added.
MST: tweak: keep header included in pvpanic.c to avoid header
dependency, rebase.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <
20240527-pvpanic-shutdown-v8-3-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thomas Weißschuh [Mon, 27 May 2024 06:27:48 +0000 (08:27 +0200)]
linux-headers: update to 6.10-rc1
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <
20240527-pvpanic-shutdown-v8-2-
5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:54 +0000 (10:44 -0700)]
hw/mem/cxl_type3: Allow to release extent superset in QMP interface
Before the change, the QMP interface used for add/release DC extents
only allows to release an extent whose DPA range is contained by a single
accepted extent in the device.
With the change, we relax the constraints. As long as the DPA range of
the extent is covered by accepted extents, we allow the release.
Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-15-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:53 +0000 (10:44 -0700)]
hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
With the change, we extend the extent release mailbox command processing
to allow more flexible release. As long as the DPA range of the extent to
release is covered by accepted extent(s) in the device, the release can be
performed.
Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-14-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:52 +0000 (10:44 -0700)]
hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
All DPA ranges in the DC regions are invalid to access until an extent
covering the range has been successfully accepted by the host. A bitmap
is added to each region to record whether a DC block in the region has
been backed by a DC extent. Each bit in the bitmap represents a DC block.
When a DC extent is accepted, all the bits representing the blocks in the
extent are set, which will be cleared when the extent is released.
Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-13-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:51 +0000 (10:44 -0700)]
hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.
With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.
1. Add dynamic capacity extents:
For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:
{ "execute": "qmp_capabilities" }
{ "execute": "cxl-add-dynamic-capacity",
"arguments": {
"path": "/machine/peripheral/cxl-dcd0",
"host-id": 0,
"selection-policy": "prescriptive",
"region": 0,
"extents": [
{
"offset": 0,
"len":
134217728
},
{
"offset":
134217728,
"len":
134217728
}
]
}
}
2. Release dynamic capacity extents:
For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:
{ "execute": "cxl-release-dynamic-capacity",
"arguments": {
"path": "/machine/peripheral/cxl-dcd0",
"host-id": 0,
"removal-policy":"prescriptive",
"region": 0,
"extents": [
{
"offset":
134217728,
"len":
134217728
}
]
}
}
Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-12-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:50 +0000 (10:44 -0700)]
hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
Per CXL spec 3.1, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
For the process of the above two commands, we use two-pass approach.
Pass 1: Check whether the input payload is valid or not; if not, skip
Pass 2 and return mailbox process error.
Pass 2: Do the real work--add or release extents, respectively.
Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-11-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:49 +0000 (10:44 -0700)]
hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
Add dynamic capacity extent list representative to the definition of
CXLType3Dev and implement get DC extent list mailbox command per
CXL.spec.3.1:.8.2.9.9.9.2.
Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-10-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:48 +0000 (10:44 -0700)]
hw/mem/cxl_type3: Add host backend and address space handling for DC regions
Add (file/memory backed) host backend for DCD. All the dynamic capacity
regions will share a single, large enough host backend. Set up address
space for DC regions to support read/write operations to dynamic capacity
for DCD.
With the change, the following support is added:
1. Add a new property to type3 device "volatile-dc-memdev" to point to host
memory backend for dynamic capacity. Currently, all DC regions share one
host backend;
2. Add namespace for dynamic capacity for read/write support;
3. Create cdat entries for each dynamic capacity region.
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-9-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:47 +0000 (10:44 -0700)]
hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
The function ct3_build_cdat_entries_for_mr only uses size of the passed
memory region argument, refactor the function definition to make the passed
arguments more specific.
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-8-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:46 +0000 (10:44 -0700)]
hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
With the change, when setting up memory for type3 memory device, we can
create DC regions.
A property 'num-dc-regions' is added to ct3_props to allow users to pass the
number of DC regions to create. To make it easier, other region parameters
like region base, length, and block size are hard coded. If needed,
these parameters can be added easily.
With the change, we can create DC regions with proper kernel side
support like below:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x40000000 > /sys/bus/cxl/devices/$region/size
echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-7-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Fan Ni [Thu, 23 May 2024 17:44:45 +0000 (10:44 -0700)]
include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
pmem capacity, preparing for the introduction of dynamic capacity to support
dynamic capacity devices.
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-6-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:44 +0000 (10:44 -0700)]
hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
Per cxl spec r3.1, add dynamic capacity (DC) region representative based on
Table 8-165 and extend the cxl type3 device definition to include DC region
information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.
Note: we store region decode length as byte-wise length on the device, which
should be divided by 256 * MiB before being returned to the host
for "Get Dynamic Capacity Configuration" mailbox command per
specification.
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-5-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fan Ni [Thu, 23 May 2024 17:44:43 +0000 (10:44 -0700)]
hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-4-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Gregory Price [Thu, 23 May 2024 17:44:42 +0000 (10:44 -0700)]
hw/cxl/mailbox: interface to add CCI commands to an existing CCI
This enables wrapper devices to customize the base device's CCI
(for example, with custom commands outside the specification)
without the need to change the base device.
The also enabled the base device to dispatch those commands without
requiring additional driver support.
Heavily edited by Jonathan Cameron to increase code reuse
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-3-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Gregory Price [Thu, 23 May 2024 17:44:41 +0000 (10:44 -0700)]
hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference
This allows devices to have fully customized CCIs, along with complex
devices where wrapper devices can override or add additional CCI
commands without having to replicate full command structures or
pollute a base device with every command that might ever be used.
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <
20240523174651.
1089554-2-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Li Feng [Thu, 16 May 2024 02:57:46 +0000 (10:57 +0800)]
vhost-user: fix lost reconnect again
When the vhost-user is reconnecting to the backend, and if the vhost-user fails
at the get_features in vhost_dev_init(), then the reconnect will fail
and it will not be retriggered forever.
The reason is:
When the vhost-user fail at get_features, the vhost_dev_cleanup will be called
immediately.
vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
The reconnect path is:
vhost_user_blk_event
vhost_user_async_close(.. vhost_user_blk_disconnect ..)
qemu_chr_fe_set_handlers <----- clear the notifier callback
schedule vhost_user_async_close_bh
The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
called, then the event fd callback will not be reinstalled.
We need to ensure that even if vhost_dev_init initialization fails, the event
handler still needs to be reinstalled when s->connected is false.
All vhost-user devices have this issue, including vhost-user-blk/scsi.
Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <
20240516025753.130171-3-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Li Feng [Thu, 16 May 2024 02:57:45 +0000 (10:57 +0800)]
Revert "vhost-user: fix lost reconnect"
This reverts commit
f02a4b8e6431598612466f76aac64ab492849abf.
Since the current patch cannot completely fix the lost reconnect
problem, there is a scenario that is not considered:
- When the virtio-blk driver is removed from the guest os,
s->connected has no chance to be set to false, resulting in
subsequent reconnection not being executed.
The next patch will completely fix this issue with a better approach.
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <
20240516025753.130171-2-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Marc-André Lureau [Wed, 15 May 2024 10:52:37 +0000 (14:52 +0400)]
vhost-user-gpu: fix import of DMABUF
When using vhost-user-gpu with GL, qemu -display gtk doesn't show output
and prints: qemu: eglCreateImageKHR failed
Since commit
9ac06df8b ("virtio-gpu-udmabuf: correct naming of
QemuDmaBuf size properties"), egl_dmabuf_import_texture() uses
backing_{width,height} for the texture dimension.
Fixes: 9ac06df8b ("virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <
20240515105237.
1074116-1-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jiqian Chen [Wed, 15 May 2024 07:35:25 +0000 (15:35 +0800)]
virtio-pci: only reset pm state during resetting
Fix bug imported by
27ce0f3afc9dd ("fix Power Management Control Register for PCI Express virtio devices"
After this change, observe that QEMU may erroneously clear the power status of the device,
or may erroneously clear non writable registers, such as NO_SOFT_RESET, etc.
Only state of PM_CTRL is writable.
Only when flag VIRTIO_PCI_FLAG_INIT_PM is set, need to reset state.
Fixes: 27ce0f3afc9dd ("fix Power Management Control Register for PCI Express virtio devices"
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Message-Id: <
20240515073526.17297-2-Jiqian.Chen@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Wafer [Fri, 10 May 2024 07:27:53 +0000 (15:27 +0800)]
hw/virtio: Fix obtain the buffer id from the last descriptor
The virtio-1.3 specification
<https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html> writes:
2.8.6 Next Flag: Descriptor Chaining
Buffer ID is included in the last descriptor in the list.
If the feature (_F_INDIRECT_DESC) has been negotiated, install only
one descriptor in the virtqueue.
Therefor the buffer id should be obtained from the first descriptor.
In descriptor chaining scenarios, the buffer id should be obtained
from the last descriptor.
Fixes: 86044b24e8 ("virtio: basic packed virtqueue support")
Signed-off-by: Wafer <wafer@jaguarmicro.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20240510072753.26158-2-wafer@jaguarmicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Halil Pasic [Mon, 29 Apr 2024 11:33:34 +0000 (13:33 +0200)]
vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits
Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
but the in QEMU device is configured to try to use the packed layout
(the virtio property "packed" is on).
As of today, the Linux kernel vhost-vsock device does not support the
packed queue layout (as vhost does not support packed), and does not
offer VIRTIO_F_RING_PACKED. Thus when for example a vhost-vsock-ccw is
used with packed=on, VIRTIO_F_RING_PACKED ends up being negotiated,
despite the fact that the device does not actually support it, and
one gets to keep the pieces.
Fixes: 74b3e46630 ("virtio: add property to enable packed virtqueue")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <
20240429113334.
2454197-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Christian Pötzsch [Fri, 26 Apr 2024 08:33:13 +0000 (10:33 +0200)]
Fix vhost user assertion when sending more than one fd
If the client sends more than one region this assert triggers. The
reason is that two fd's are 8 bytes and VHOST_MEMORY_BASELINE_NREGIONS
is exactly 8.
The assert is wrong because it should not test for the size of the fd
array, but for the numbers of regions.
Signed-off-by: Christian Pötzsch <christian.poetzsch@kernkonzept.com>
Message-Id: <
20240426083313.
3081272-1-christian.poetzsch@kernkonzept.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonah Palmer [Fri, 15 Mar 2024 16:55:56 +0000 (12:55 -0400)]
vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.
The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <
20240315165557.26942-6-jonah.palmer@oracle.com>
Acked-by: Srujana Challa <schalla@marvell.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonah Palmer [Fri, 15 Mar 2024 16:55:55 +0000 (12:55 -0400)]
virtio-ccw: Handle extra notification data
Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.
The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.
That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <
20240315165557.26942-5-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonah Palmer [Fri, 15 Mar 2024 16:55:54 +0000 (12:55 -0400)]
virtio-mmio: Handle extra notification data
Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.
The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.
The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <
20240315165557.26942-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonah Palmer [Fri, 15 Mar 2024 16:55:53 +0000 (12:55 -0400)]
virtio: Prevent creation of device using notification-data with ioeventfd
Prevent the realization of a virtio device that attempts to use the
VIRTIO_F_NOTIFICATION_DATA transport feature without disabling
ioeventfd.
Due to ioeventfd not being able to carry the extra data associated with
this feature, having both enabled is a functional mismatch and therefore
Qemu should not continue the device's realization process.
Although the device does not yet know if the feature will be
successfully negotiated, many devices using this feature wont actually
work without this extra data and would fail FEATURES_OK anyway.
If ioeventfd is able to work with the extra notification data in the
future, this compatibility check can be removed.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <
20240315165557.26942-3-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonah Palmer [Fri, 15 Mar 2024 16:55:52 +0000 (12:55 -0400)]
virtio/virtio-pci: Handle extra notification data
Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.
The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.
In a split virtqueue layout, this data includes:
- upper 16 bits: shadow_avail_idx
- lower 16 bits: virtqueue index
In a packed virtqueue layout, this data includes:
- upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
- lower 16 bits: virtqueue index
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <
20240315165557.26942-2-jonah.palmer@oracle.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Fri, 22 Mar 2024 09:23:15 +0000 (10:23 +0100)]
vhost-vdpa: check vhost_vdpa_set_vring_ready() return value
vhost_vdpa_set_vring_ready() could already fail, but if Linux's
patch [1] will be merged, it may fail with more chance if
userspace does not activate virtqueues before DRIVER_OK when
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.
So better check its return value anyway.
[1] https://lore.kernel.org/virtualization/
20240206145154.118044-1-sgarzare@redhat.com/T/#u
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20240322092315.31885-1-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Si-Wei Liu [Thu, 14 Mar 2024 20:27:35 +0000 (13:27 -0700)]
vhost: Perform memory section dirty scans once per iteration
On setups with one or more virtio-net devices with vhost on,
dirty tracking iteration increases cost the bigger the number
amount of queues are set up e.g. on idle guests migration the
following is observed with virtio-net with vhost=on:
48 queues -> 78.11% [.] vhost_dev_sync_region.isra.13
8 queues -> 40.50% [.] vhost_dev_sync_region.isra.13
1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
2 devices, 1 queue -> 18.60% [.] vhost_dev_sync_region.isra.14
With high memory rates the symptom is lack of convergence as soon
as it has a vhost device with a sufficiently high number of queues,
the sufficient number of vhost devices.
On every migration iteration (every 100msecs) it will redundantly
query the *shared log* the number of queues configured with vhost
that exist in the guest. For the virtqueue data, this is necessary,
but not for the memory sections which are the same. So essentially
we end up scanning the dirty log too often.
To fix that, select a vhost device responsible for scanning the
log with regards to memory sections dirty tracking. It is selected
when we enable the logger (during migration) and cleared when we
disable the logger. If the vhost logger device goes away for some
reason, the logger will be re-selected from the rest of vhost
devices.
After making mem-section logger a singleton instance, constant cost
of 7%-9% (like the 1 queue report) will be seen, no matter how many
queues or how many vhost devices are configured:
48 queues -> 8.71% [.] vhost_dev_sync_region.isra.13
2 devices, 8 queues -> 7.97% [.] vhost_dev_sync_region.isra.14
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <
1710448055-11709-2-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Si-Wei Liu [Thu, 14 Mar 2024 20:27:34 +0000 (13:27 -0700)]
vhost: dirty log should be per backend type
There could be a mix of both vhost-user and vhost-kernel clients
in the same QEMU process, where separate vhost loggers for the
specific vhost type have to be used. Make the vhost logger per
backend type, and have them properly reference counted.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <
1710448055-11709-1-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Richard Henderson [Mon, 1 Jul 2024 16:06:25 +0000 (09:06 -0700)]
Merge tag 'pull-xen-
20240701' of https://xenbits.xen.org/git-http/people/aperard/qemu-dm into staging
Xen queue:
* Improvement for running QEMU in a stubdomain.
* Improve handling of buffered ioreqs.
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEE+AwAYwjiLP2KkueYDPVXL9f7Va8FAmaCq2IACgkQDPVXL9f7
# Va9PHQf+N4SGAo8rD6Nw7z73b9/Qd20Pz82Pm3BLnJtioxxOhVPU33HJsyjkQRSs
# dVckRZk6IFfiAKWTPDsQfeL+qDBjL15usuZCLeq7zRr5NwV5OOlSh6fW6yurY8IR
# zHoCJTjYcaXbMCVIzAXhM19rZjFZCLNFYb3ADRvDANaxbhSx60EAg69S8gQeQhgw
# BVC5inDxMGSl4X7i8eh+E39H8X1RKNg4GQyLWOVksdElQuKeGFMThaSCmA3OHkOV
# Ny70+PrCM3Z1sbUMI3lDHdT4f9JXcYqJbnCjCDHCZgOeF2Z5UEfFlPiVQIo4OA7o
# b48LbOuThEZew4SrJS9lx9RKafoFyw==
# =oLvq
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 01 Jul 2024 06:13:06 AM PDT
# gpg: using RSA key
F80C006308E22CFD8A92E7980CF5572FD7FB55AF
# gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [undefined]
# gpg: aka "Anthony PERARD <anthony.perard@citrix.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5379 2F71 024C 600F 778A 7161 D8D5 7199 DF83 42C8
# Subkey fingerprint: F80C 0063 08E2 2CFD 8A92 E798 0CF5 572F D7FB 55AF
* tag 'pull-xen-
20240701' of https://xenbits.xen.org/git-http/people/aperard/qemu-dm:
xen-hvm: Avoid livelock while handling buffered ioreqs
xen: fix stubdom PCI addr
hw/xen: detect when running inside stubdomain
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Ross Lagerwall [Thu, 4 Apr 2024 14:08:33 +0000 (15:08 +0100)]
xen-hvm: Avoid livelock while handling buffered ioreqs
A malicious or buggy guest may generated buffered ioreqs faster than
QEMU can process them in handle_buffered_iopage(). The result is a
livelock - QEMU continuously processes ioreqs on the main thread without
iterating through the main loop which prevents handling other events,
processing timers, etc. Without QEMU handling other events, it often
results in the guest becoming unsable and makes it difficult to stop the
source of buffered ioreqs.
To avoid this, if we process a full page of buffered ioreqs, stop and
reschedule an immediate timer to continue processing them. This lets
QEMU go back to the main loop and catch up.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <
20240404140833.
1557953-1-ross.lagerwall@citrix.com>
Signed-off-by: Anthony PERARD <anthony@xenproject.org>
Marek Marczykowski-Górecki [Wed, 27 Mar 2024 03:05:15 +0000 (04:05 +0100)]
xen: fix stubdom PCI addr
When running in a stubdomain, the config space access via sysfs needs to
use BDF as seen inside stubdomain (connected via xen-pcifront), which is
different from the real BDF. For other purposes (hypercall parameters
etc), the real BDF needs to be used.
Get the in-stubdomain BDF by looking up relevant PV PCI xenstore
entries.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <
35049e99da634a74578a1ff2cb3ae4cc436ede33.
1711506237.git-series.marmarek@invisiblethingslab.com>
Signed-off-by: Anthony PERARD <anthony@xenproject.org>
Marek Marczykowski-Górecki [Wed, 27 Mar 2024 03:05:14 +0000 (04:05 +0100)]
hw/xen: detect when running inside stubdomain
Introduce global xen_is_stubdomain variable when qemu is running inside
a stubdomain instead of dom0. This will be relevant for subsequent
patches, as few things like accessing PCI config space need to be done
differently.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <
e66aa97dca5120f22e015c19710b2ff04f525720.
1711506237.git-series.marmarek@invisiblethingslab.com>
Signed-off-by: Anthony PERARD <anthony@xenproject.org>
Richard Henderson [Sun, 30 Jun 2024 23:12:24 +0000 (16:12 -0700)]
Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging
trivial patches for 2024-06-30
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmaBjTkACgkQcBtPaxpp
# PlmhAAf+PZEsiBvffwwNH5n1q39Hilih35p/GCVpNYKcLsFB6bLmt9A/x062NqTS
# ob1Uj134ofHlSQtNjP1KxXdriwc40ZMahkTO+x6gYc+IpoRJGTGYEA0MWh4gPPYK
# S6l/nOI9JK1x+ot+bQzGOzOjz3/S7RJteXzwOPlWQ7GChz8NIUPWV3DkcVP0AeT0
# 7Lq7GtDBSV5Jbne2IrvOGadjPOpJiiLEmLawmw1c9qapIKAu2wxNBMlO98ufsg6L
# hDFEg6K0CKvM9fcdK8UXhnMa+58QwHhoJT+Q00aQcU1xzu+ifi/CrmgbRCK5ruTA
# o0I8q6ONbK33cTzyZ/ZmKtoA8b/Rzw==
# =N3GX
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 30 Jun 2024 09:52:09 AM PDT
# gpg: using RSA key
7B73BAD68BE7A2C289314B22701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full]
# gpg: aka "Michael Tokarev <mjt@debian.org>" [full]
# gpg: aka "Michael Tokarev <mjt@corpit.ru>" [full]
* tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu:
hw/core/loader: gunzip(): fix memory leak on error path
vl.c: select_machine(): add selected machine type to error message
vl.c: select_machine(): use g_autoptr
vl.c: select_machine(): use ERRP_GUARD instead of error propagation
docs/system/devices/usb: Replace the non-existing "qemu" binary
docs/cxl: fix some typos
os-posix: Expand setrlimit() syscall compatibility
net/can: Remove unused struct 'CanBusState'
hw/arm/bcm2836: Remove unusued struct 'BCM283XClass'
linux-user: sparc: Remove unused struct 'target_mc_fq'
linux-user: cris: Remove unused struct 'rt_signal_frame'
monitor: Remove obsolete stubs
target/i386: Advertise MWAIT iff host supports
vl: Allow multiple -overcommit commands
cpu: fix memleak of 'halt_cond' and 'thread'
hmp-commands-info.hx: Add missing info command for stats subcommand
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Richard Henderson [Sun, 30 Jun 2024 19:41:57 +0000 (12:41 -0700)]
Merge tag 'pull-ufs-
20240630' of https://gitlab.com/jeuk20.kim/qemu into staging
hw/ufs: fix coverity issue
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEEUBfYMVl8eKPZB+73EuIgTA5dtgIFAmaA1MQACgkQEuIgTA5d
# tgIYSBAAul4qW0P6q0h3Dj/MLcGMPo4Y4kcWKe2AAkE/mBRvKbE7bLsA0y47WU5S
# MJJApw4lwCsM12ZcD0W3YNbNwGUclQAVhLU5TOMowwaEWjNwmcsBR+AVwya4M2jQ
# zSw6udIo5dfdy6KSe2EbRAuoDqBFJrcIH6EbXn/pBIhotlFzyUYYcpPBAq3rwh+V
# haEtt3DapAektx+QkswBNEWu002OHyNDQXqfHnFvNMAYN9T25Nr+REai3VhZj379
# F/p5bFxou9FnwuGXRrpS1Em1jT+gRJnYoxp6iML8Zb4eZLhFs7T3WWkXHhbq7Nbt
# oeg1CFdQeIt1iowk/dhtnSEQqnLe9dfPHj7pxU98dkYXHcN52Q5CRb+c0JnEyBLc
# lGIjLVWvqYitOwGmvIdSmStd5TCLtuYmQGaI3slZCvsJTSo4Tkx3eI504NTVQ4K2
# lNY0jb+0PIsEUlyssimlsDA0SCkbpe5yE1G2NDCP74MjG0mlUm/h/OU0etk7uhwv
# DNr1Lljr04FhcgVbMGX5sbMeK2QiCDuOlCF1T4zkzDFdWKIl414vH1wvjv1cBKlj
# RdAfAi8zIV5lOeSqX13E9B0tjwUALlWFApW8J7pefijSBOGxEfFQJ39Gd4eIEFgD
# Bj9Nc1ddDs30YaCZSMYsqcHU09srlobWmPqadba6hyJW4L1B9bU=
# =d0WA
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 29 Jun 2024 08:45:08 PM PDT
# gpg: using RSA key
5017D831597C78A3D907EEF712E2204C0E5DB602
# gpg: Good signature from "Jeuk Kim <jeuk20.kim@samsung.com>" [unknown]
# gpg: aka "Jeuk Kim <jeuk20.kim@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5017 D831 597C 78A3 D907 EEF7 12E2 204C 0E5D B602
* tag 'pull-ufs-
20240630' of https://gitlab.com/jeuk20.kim/qemu:
hw/ufs: Fix potential bugs in MMIO read|write
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Vladimir Sementsov-Ogievskiy [Thu, 27 Jun 2024 16:25:07 +0000 (19:25 +0300)]
hw/core/loader: gunzip(): fix memory leak on error path
We should call inflateEnd() like on success path to cleanup state in s
variable.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Vladimir Sementsov-Ogievskiy [Wed, 26 Jun 2024 13:43:05 +0000 (16:43 +0300)]
vl.c: select_machine(): add selected machine type to error message
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Vladimir Sementsov-Ogievskiy [Wed, 26 Jun 2024 13:43:04 +0000 (16:43 +0300)]
vl.c: select_machine(): use g_autoptr
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Vladimir Sementsov-Ogievskiy [Wed, 26 Jun 2024 13:43:03 +0000 (16:43 +0300)]
vl.c: select_machine(): use ERRP_GUARD instead of error propagation
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Wed, 26 Jun 2024 09:44:06 +0000 (11:44 +0200)]
docs/system/devices/usb: Replace the non-existing "qemu" binary
We don't ship a binary that is simply called "qemu", so we should
avoid this in the documentation. Use the configurable binary name
via "|qemu_system|" instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Hyeongtak Ji [Wed, 26 Jun 2024 04:34:58 +0000 (13:34 +0900)]
docs/cxl: fix some typos
This patch corrects minor typographical errors to ensure the ASCII art
aligns with the explanations provided. Specifically, it fixes an
incorrect root port reference and removes redundant words.
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Trent Huber [Fri, 14 Jun 2024 21:06:38 +0000 (17:06 -0400)]
os-posix: Expand setrlimit() syscall compatibility
Darwin uses a subtly different version of the setrlimit() syscall as
described in the COMPATIBILITY section of the macOS man page. The value
of the rlim_cur member has been adjusted accordingly for Darwin-based
systems.
Signed-off-by: Trent Huber <trentmhuber@gmail.com>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Dr. David Alan Gilbert [Sun, 5 May 2024 17:14:44 +0000 (18:14 +0100)]
net/can: Remove unused struct 'CanBusState'
As far as I can tell this struct has never been used in this
file (it is used in can_core.c).
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Dr. David Alan Gilbert [Sun, 5 May 2024 17:14:42 +0000 (18:14 +0100)]
hw/arm/bcm2836: Remove unusued struct 'BCM283XClass'
This struct has been unused since
Commit
f932093ae165 ("hw/arm/bcm2836: Split out common part of BCM283X
classes")
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Dr. David Alan Gilbert [Sun, 5 May 2024 17:14:40 +0000 (18:14 +0100)]
linux-user: sparc: Remove unused struct 'target_mc_fq'
This struct is unused since Peter's
Commit
b8ae597f0e6d ("linux-user/sparc: Fix errors in target_ucontext
structures")
However, hmm, I'm a bit confused since that commit modifies the
structure and then removes it, was that intentional?
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Dr. David Alan Gilbert [Sun, 5 May 2024 17:14:38 +0000 (18:14 +0100)]
linux-user: cris: Remove unused struct 'rt_signal_frame'
Since 'setup_rt_frame' has never been implemented, this struct
is unused.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Philippe Mathieu-Daudé [Mon, 10 Jun 2024 06:39:24 +0000 (08:39 +0200)]
monitor: Remove obsolete stubs
hmp_info_roms() was removed in commit
dd98234c05 ("qapi:
introduce x-query-roms QMP command"),
hmp_info_numa() in commit
1b8ae799d8 ("qapi: introduce
x-query-numa QMP command"),
hmp_info_ramblock() in commit
ca411b7c8a ("qapi: introduce
x-query-ramblock QMP command")
and hmp_info_irq() in commit
91f2fa7045 ("qapi: introduce
x-query-irq QMP command").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>