git.maquefel.me Git - qemu.git/log

Revert "chardev/char-socket: Fix TLS io channels sending too much data to the backend"

This commit results in unexpected termination of the TLS connection.
When 'fd_can_read' returns 0, the code goes on to pass a zero length
buffer to qio_channel_read. The TLS impl calls into gnutls_recv()
with this zero length buffer, at which point GNUTLS returns an error
GNUTLS_E_INVALID_REQUEST. This is treated as fatal by QEMU's TLS code
resulting in the connection being torn down by the chardev.

Simply skipping the qio_channel_read when the buffer length is zero
is also not satisfactory, as it results in a high CPU burn busy loop
massively slowing QEMU's functionality.

The proper solution is to avoid tcp_chr_read being called at all
unless the frontend is able to accept more data. This will be done
in a followup commit.

This reverts commit 462945cd22d2bcd233401ed3aa167d83a8e35b05

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

chardev: lower priority of the HUP GSource in socket chardev

The socket chardev often has 2 GSource object registered against the
same FD. One is registered all the time and is just intended to handle
POLLHUP events, while the other gets registered & unregistered on the
fly as the frontend is ready to receive more data or not.

It is very common for poll() to signal a POLLHUP event at the same time
as there is pending incoming data from the disconnected client. It is
therefore essential to process incoming data prior to processing HUP.
The problem with having 2 GSource on the same FD is that there is no
guaranteed ordering of execution between them, so the chardev code may
process HUP first and thus discard data.

This failure scenario is non-deterministic but can be seen fairly
reliably by reverting a7077b8e354d90fec26c2921aa2dea85b90dff90, and
then running 'tests/unit/test-char', which will sometimes fail with
missing data.

Ideally QEMU would only have 1 GSource, but that's a complex code
refactoring job. The next best solution is to try to ensure ordering
between the 2 GSource objects. This can be achieved by lowering the
priority of the HUP GSource, so that it is never dispatched if the
main GSource is also ready to dispatch. Counter-intuitively, lowering
the priority of a GSource is done by raising its priority number.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

seccomp: report EPERM instead of killing process for spawn set

When something tries to run one of the spawn syscalls (eg clone),
our seccomp deny filter is set to cause a fatal trap which kills
the process.

This is found to be unhelpful when QEMU has loaded the nvidia
GL library. This tries to spawn a process to modprobe the nvidia
kmod. This is a dubious thing to do, but at the same time, the
code will gracefully continue if this fails. Our seccomp filter
rightly blocks the spawning, but prevent the graceful continue.

Switching to reporting EPERM will make QEMU behave more gracefully
without impacting the level of protect we have.

https://gitlab.com/qemu-project/qemu/-/issues/2116
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Update version for v9.0.0-rc0 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

This fix solves the "failed to set up stack guard page" error that has been
reported on Linux hosts where the QEMU coroutine pool exceeds the
vm.max_map_count limit.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmX5qq0ACgkQnKSrs4Gr
# c8ginQf8DRKzA7K8OivEegKpf0TgGcAcw9/xKc6zJH3X0/GXi1my61tzz+XUkbNy
# /R9HRrjBUb4MhSmJzP9kxuPFcBD5fZeipg4eTqtJCdi+DQ57+YypShVpsDrD7eNv
# X5dxeeONdWwP+k9JiOj9NtSOMmTKExn/Q/w45G2eeBlJh4yRA+56XN/dDXTFlidm
# NEpOGrKbyFKuAf/ZwYmeBr4aqIGTN3UgOVco/rqkGPYPTYpKlCoE5rSTEnQrbR7/
# C9KojlrGawJXlKjxfu/6i7yGHrv0eJ2N1VauvR/DHhQvdRhojVVt3NFGG/WJi+cL
# CMbxNyYeQJLNFtfPWzokjKEudxkshg==
# =lznr
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Mar 2024 15:09:33 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  coroutine: cap per-thread local pool size

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

coroutine: cap per-thread local pool size

The coroutine pool implementation can hit the Linux vm.max_map_count
limit, causing QEMU to abort with "failed to allocate memory for stack"
or "failed to set up stack guard page" during coroutine creation.

This happens because per-thread pools can grow to tens of thousands of
coroutines. Each coroutine causes 2 virtual memory areas to be created.
Eventually vm.max_map_count is reached and memory-related syscalls fail.
The per-thread pool sizes are non-uniform and depend on past coroutine
usage in each thread, so it's possible for one thread to have a large
pool while another thread's pool is empty.

Switch to a new coroutine pool implementation with a global pool that
grows to a maximum number of coroutines and per-thread local pools that
are capped at hardcoded small number of coroutines.

This approach does not leave large numbers of coroutines pooled in a
thread that may not use them again. In order to perform well it
amortizes the cost of global pool accesses by working in batches of
coroutines instead of individual coroutines.

The global pool is a list. Threads donate batches of coroutines to when
they have too many and take batches from when they have too few:

.-----------------------------------.
| Batch 1 | Batch 2 | Batch 3 | ... | global_pool
`-----------------------------------'

Each thread has up to 2 batches of coroutines:

.-------------------.
| Batch 1 | Batch 2 | per-thread local_pool (maximum 2 batches)
`-------------------'

The goal of this change is to reduce the excessive number of pooled
coroutines that cause QEMU to abort when vm.max_map_count is reached
without losing the performance of an adequately sized coroutine pool.

Here are virtio-blk disk I/O benchmark results:

      RW BLKSIZE IODEPTH    OLD    NEW CHANGE
randread      4k       1 113725 117451 +3.3%
randread      4k       8 192968 198510 +2.9%
randread      4k      16 207138 209429 +1.1%
randread      4k      32 212399 215145 +1.3%
randread      4k      64 218319 221277 +1.4%
randread    128k       1  17587  17535 -0.3%
randread    128k       8  17614  17616 +0.0%
randread    128k      16  17608  17609 +0.0%
randread    128k      32  17552  17553 +0.0%
randread    128k      64  17484  17484 +0.0%

See files/{fio.sh,test.xml.j2} for the benchmark configuration:
https://gitlab.com/stefanha/virt-playbooks/-/tree/coroutine-pool-fix-sizing

Buglink: https://issues.redhat.com/browse/RHEL-28947
Reported-by: Sanjay Rao <srao@redhat.com>
Reported-by: Boaz Ben Shabat <bbenshab@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240318183429.1039340-1-stefanha@redhat.com>

Merge tag 'pull-for-9.0-20240319' of https://github.com/legoater/qemu into staging

aspeed, pnv, vfio queue:

* user device fixes for Aspeed and PowerNV machines
* coverity fix for iommufd

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmX5mm0ACgkQUaNDx8/7
# 7KE/MQ/9GeX4yNBxY2iTATdmPXwjMw8AtKyfIQb605nIO0ch1Z98ywl5VMwCNohn
# ppY9L5bFpEASgRlFVm73X4DGxKyRGpRPqylsvINh0hKciRpmRkELHY3llhnXsd7P
# Q197pDtFr54FeX8j4+hSAu4paT97fPENlKn0J6lto2I1cXGcD1LYNDFhysoXdGme
# brJgo7KjQJZPZ560ZewskL5FWf3G9EkRjpqd8y0G5OtNmAPgAaahOMHhDCXan182
# J89I9CHI5xN45MRfAs8JamSaj/GyNsr4h04WhPa0+VZQ5vsaeW2Ekt4ypj+oAV+p
# wykhYzQk4ALZcmmph2flSAtLa7uheI+imyqubMthQCDj3G8onSQBMd5/4WRK6O49
# 0oE1DpPDEfhlJEQYxaYhOeqeA9iaP+w6V+yE+L5oGlMO66cR7GZsPu0x7kXailbH
# IoHw9mO+vMkpuyeP7M3hA8WRFCdFpf1Nn1Ao5Jz3KoiTyJWlIvX5VSaj12sjddQ2
# fU9SKu2Q5QqS5uQGakkY64EyUy7RkGIX6zY2NIscVe2lfAfKf3mZwu7OIuLjEy5O
# lRn35vWV8fOdRooKoDPTNcdBCaNPi+RApin8chOv5P+F+ie7+Twf9sb1AgH/pIcv
# HptvTXbvSFNbbdb+OE8a5qsqTvnrN8d31IXzrWRYsJB07x2IyoA=
# =zR3v
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Mar 2024 14:00:13 GMT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-for-9.0-20240319' of https://github.com/legoater/qemu:
  aspeed/smc: Only wire flash devices at reset
  ppc/pnv: I2C controller is not user creatable
  vfio/iommufd: Fix memory leak

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests: Raise timeouts for bufferiszero and crypto-tlscredsx509

On our gcov CI job, the bufferiszero and crypto-tlscredsx509
tests time out occasionally, making the job flaky. Double the
timeout on these two tests.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2221
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20240312110815.116992-1-peter.maydell@linaro.org

aspeed/smc: Only wire flash devices at reset

The Aspeed machines have many Static Memory Controllers (SMC), up to
8, which can only drive flash memory devices. Commit 27a2c66c92ec
("aspeed/smc: Wire CS lines at reset") tried to ease the definitions
of these devices by allowing flash devices from the command line to be
attached to a SSI bus. For that, the wiring of the CS lines of the
Aspeed SMC controller was moved at reset. Two assumptions are made
though, first that the device has a SSI_GPIO_CS GPIO line, which is
not always the case, and second that it is a flash device.

Correct this problem by ensuring that the devices attached to the bus
are of the correct flash type. This fixes a QEMU abort when devices
without a CS line, such as the max111x, are passed on the command
line.

While at it, export TYPE_M25P80 used in the Xilinx Versal Virtual
machine.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2228
Fixes: 27a2c66c92ec ("aspeed/smc: Wire CS lines at reset")
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
[ clg: minor fixes in the commit log ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>

ppc/pnv: I2C controller is not user creatable

The I2C controller is a subunit of the processor. Make it so and avoid
QEMU crashes.

  $ build/qemu-system-ppc64 -S -machine powernv9 -device pnv-i2c
  qemu-system-ppc64: ../hw/ppc/pnv_i2c.c:521: pnv_i2c_realize: Assertion `i2c->chip' failed.
  Aborted (core dumped)

Fixes: 263b81ee15af ("ppc/pnv: Add an I2C controller model")
Cc: Glenn Miles <milesg@linux.vnet.ibm.com>
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/iommufd: Fix memory leak

Coverity reported a memory leak on variable 'contents' in routine
iommufd_cdev_getfd(). Use g_autofree variables to simplify the exit
path and get rid of g_free() calls.

Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Liu <yi.l.liu@intel.com>
Fixes: CID 1540007
Fixes: 5ee3dc7af785 ("vfio/iommufd: Implement the iommufd backend")
Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

Merge tag 'pull-request-2024-03-18' of https://gitlab.com/thuth/qemu into staging

* Clarify s390x CPU topology docs and CPU compatibility error messages
* Improve the Sparc CPU help text
* Rename SOFTMMU to SYSTEM in the travis.yml file

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmX4f/gRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbULnBAAgAPw1tonqEyt0kEw+i088do3yprrcoA+
# vTWB1Qk8ieL7nmBaFtsKYXVeoY+KICSGY4UvN3+jFot8uwzSz3vEYOpC5Nd+m0ct
# CqLUtVeq6wpql1PLswobiPdxdLznkgrXchvXY5LwURTtr1Gtq1JjAU+HdJ2UyRyZ
# WFe2HW2kriWswaprsyu6rNlmXzDTaNo/Gn6c0d//J0XYhg1qoxWsN95pzp7gMkb/
# YKx//Ss/lN4joRsqQGBQPCF43gFJwnmXdmwhyS4EcsCJ7DfqQ9UHgx42ypOgY497
# rVY7wTQeHSDOaQxkp+Vha0IvotIKll110J7bMpDL01++li1AiCMFjSl92dA6mHxL
# ZYGIjiUgTyjOuhuhkdLXbQLCUMST4VD8GOxxajil9jqBTwehUrUrNW/SOmP0az/p
# fq0Y8XxdynY8PKuBRPAM4f5hKIVtjzkz9m9XMu4bstYhIJNkfOQSiz1XzxS0T5/8
# 4VxaNF5we/l50HTnB4rJ0FGTzXiWO8BO3zSeD1caF+7ctHQWsypNBJYyKW52ITt3
# r6K17klsoNlmh8XjOt7wCVvNgsHj8SlsmtpN3GiTivDP0FVDY7DDA92teCRZB4TZ
# EhubWrQGERAPzG6Ud+bujUpwdgJ91MVvIuBjotAgNMT2Peayfc0V9PA4+7Xg5jW2
# 1wyyU3lr8y4=
# =4Ivl
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 18 Mar 2024 17:55:04 GMT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2024-03-18' of https://gitlab.com/thuth/qemu:
  travis-ci: Rename SOFTMMU -> SYSTEM
  target/sparc/cpu: Improve the CPU help text
  target/s390x: improve cpu compatibility check error message
  docs/s390: clarify even more that cpu-topology is KVM-only

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'for_upstream' of https://git./virt/kvm/mst/qemu into staging

virtio,pc,pci: bugfixes

Some minor fixes plus a big patchset from Igor fixing
a regression with windows.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmX4NzsPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpkp0H/1foAaDYrApMiIkji4aI94bq/fwTnu5CshNP
# +YEzwJCS4qbl67/Ix2Z+xVz7twjQbgGdLd6hb9ZypAQfclUk5tDoKyCmqHtQMakX
# T080FayOvWmUEostAw7MXvuz0HpJlgnJaJBn29l1hHjA/XXahKqcc705cup+W8hv
# F7xb6AoFcbdETMzNaoqekNaHiiYyQPITY9p/UYPLzj2zyLsspR9kBebIeA1yhtXw
# Tmc3+FMquoM2fMNxpwfhCBswg662MlOXhLN3dmyLqeJRl09x1GvaeJIGMY2MbefM
# RMMv0/jqwAyii5HXew2rPIbLdULGq+hSjZo2NOlx3EOjTCaOkXc=
# =XGMp
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 18 Mar 2024 12:44:43 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (24 commits)
  smbios: add extra comments to smbios_get_table_legacy()
  tests: acpi: update expected SSDT.dimmpxm blob
  pc/q35: set SMBIOS entry point type to 'auto' by default
  tests: acpi/smbios: whitelist expected blobs
  smbios: error out when building type 4 table is not possible
  smbios: in case of entry point is 'auto' try to build v2 tables 1st
  smbios: extend smbios-entry-point-type with 'auto' value
  smbios: clear smbios_type4_count before building tables
  smbios: get rid of global smbios_ep_type
  smbios: handle errors consistently
  smbios: build legacy mode code only for 'pc' machine
  smbios: rename/expose structures/bitmaps used by both legacy and modern code
  smbios: add smbios_add_usr_blob_size() helper
  smbios: don't check type4 structures in legacy mode
  smbios: avoid mangling user provided tables
  smbios: get rid of smbios_legacy global
  smbios: get rid of smbios_smp_sockets global
  smbios: cleanup smbios_get_tables() from legacy handling
  tests: smbios: add test for legacy mode CLI options
  tests: smbios: add test for -smbios type=11 option
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- mirror: Fix deadlock
- nbd/server: Fix race in draining the export
- qemu-img snapshot: Fix formatting with large values
- Fix blockdev-snapshot-sync error reporting for no medium
- iotests fixes

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmX4OG8RHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9YdiQ//faXfGmbK6rBW4AkpwfrRM8SDHvm6hz7L
# 043ujAi3ziSXXoiec2/RK5wZ27nMJkfIrRHXpH41hgQvC6/3a4eIW6KSTaFV1PdG
# JtHCeopmVmgu7TZQ+kt/J6eLUTTLovoO94HgEfmxpr4CGZfx9RJftf2kCKILcYkh
# 9r04zSZLByVd4FJ5ZrqsFulWif5mXoGKdT/YisY3tKiCwFRWQDOoTymvJA012VtO
# MVmID593zwem3O3qtlGiGlK9qodBR4yof66xa/0gaYP98BZgv+LWnwLKha+OzSpX
# bQlxT26LY4JnSQkTdjF0QYnQiH4Q1kveUcNRZrGpA4iZxVDq1aks5DisThDwqoGG
# rhaPOWyJwJsonM1Enzim5Jd60JqvGdpTLjSA5oSyTjw62lAulnYihInERYSAFyyz
# UhQaO7qSog1//RpPEXEsiVkJBq8BE9l5I+L7+l5SCBhNr/UwZAOer/4m4X6d0SKN
# GEPRx0kH1voikzx7gIQs+Oldqvb0sg+zAvOynBxzpd+Ac6s8bFtWe+eSyWYL/ZGr
# Jg9+PL1xir/Uh7KmOnzt/iVBAmfSRpAo1O72xQXvHFYYtIP7hTkPO/vzqF206WMc
# WQFHHjfp5gVcMZ5AYg6txw+Bbtzu8g0AfB054lgnhihuShpf0E923TTDQFdV755s
# NUlrzuGu2fs=
# =+JIK
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 18 Mar 2024 12:49:51 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin:
  iotests: adapt to output change for recently introduced 'detached header' field
  tests/qemu-iotests: Restrict tests using "--blockdev file" to the file protocol
  tests/qemu-iotests: Fix some tests that use --image-opts for other protocols
  tests/qemu-iotests: Restrict tests that use --image-opts to the 'file' protocol
  tests/qemu-iotests: Restrict test 156 to the 'file' protocol
  tests/qemu-iotests: Restrict test 134 and 158 to the 'file' protocol
  tests/qemu-iotests: Restrict test 130 to the 'file' protocol
  tests/qemu-iotests: Restrict test 114 to the 'file' protocol
  tests/qemu-iotests: Restrict test 066 to the 'file' protocol
  tests/qemu-iotests: Fix test 033 for running with non-file protocols
  qemu-img: Fix Column Width and Improve Formatting in snapshot list
  blockdev: Fix blockdev-snapshot-sync error reporting for no medium
  iotests: Add test for reset/AioContext switches with NBD exports
  nbd/server: Fix race in draining the export
  mirror: Don't call job_pause_point() under graph lock

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'migration-20240317-pull-request' of https://gitlab.com/peterx/qemu into staging

Migration pull for 9.0-rc0

- Nicholas/Phil's fix on migration corruption / inconsistent for tcg
- Cedric's fix on block migration over n_sectors==0
- Steve's CPR reboot documentation page
- Fabiano's misc fixes on mapped-ram (IOC leak, dup() errors, fd checks, fd
  use race, etc.)

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZfdZEhIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wa+1AEA0+f7nCssvsILvCY9KifYO+OUJsLodUuQ
# JW0JBz+1iPMA+wSiyIVl2Xg78Q97nJxv71UJf+1cDJENA5EMmXMnxmYK
# =SLnA
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 17 Mar 2024 20:56:50 GMT
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-20240317-pull-request' of https://gitlab.com/peterx/qemu:
  migration/multifd: Duplicate the fd for the outgoing_args
  migration/multifd: Ensure we're not given a socket for file migration
  migration: Fix iocs leaks during file and fd migration
  migration: cpr-reboot documentation
  migration: Skip only empty block devices
  physmem: Fix migration dirty bitmap coherency with TCG memory access
  physmem: Factor cpu_physical_memory_dirty_bits_cleared() out
  physmem: Expose tlb_reset_dirty_range_all()
  migration: Fix error handling after dup in file migration
  io: Introduce qio_channel_file_new_dupfd

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

travis-ci: Rename SOFTMMU -> SYSTEM

Since we *might* have user emulation with softmmu,
rename MAIN_SOFTMMU_TARGETS as MAIN_SYSTEM_TARGETS
to express 'system emulation targets'.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240313213339.82071-3-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/sparc/cpu: Improve the CPU help text

Remove the unnecessary "Sparc" at the beginning of the line and
put the chip information into parentheses so that it is clearer
which part of the line have to be passed to "-cpu" to specify a
different CPU.

Message-ID: <20240307174334.130407-4-thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/s390x: improve cpu compatibility check error message

some users were confused by this message showing under TCG:

Selected CPU generation is too new. Maximum supported model
in the configuration: 'xyz'

Clarify that the maximum can depend on the accel, and add a
hint to try a different one.

Also add a hint for features mismatch to suggest trying
different accel, QEMU and kernel versions.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-ID: <20240314213746.27163-1-cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

smbios: add extra comments to smbios_get_table_legacy()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240314152302.2324164-22-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: acpi: update expected SSDT.dimmpxm blob

address shift is caused by switch to 32-bit SMBIOS entry point
which has slightly different size from 64-bit one and happens
to trigger a bit different memory layout.

Expected diff:

- Name (MEMA, 0x07FFE000)
+ Name (MEMA, 0x07FFF000)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-21-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pc/q35: set SMBIOS entry point type to 'auto' by default

Use smbios-entry-point-type='auto' for newer machine types as a workaround
for Windows not detecting SMBIOS tables. Which makes QEMU pick SMBIOS tables
based on configuration (with 2.x preferred and fallback to 3.x if the former
isn't compatible with configuration)

Default compat setting of smbios-entry-point-type after series
for pc/q35 machines:
  * 9.0-newer: 'auto'
  * 8.1-8.2: '64'
  * 8.0-older: '32'

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2008
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-20-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: acpi/smbios: whitelist expected blobs

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-19-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: error out when building type 4 table is not possible

If SMBIOS v2 version is requested but number of cores/threads
are more than it's possible to describe with v2, error out
instead of silently ignoring the fact and filling core/thread
count with bogus values.

This will help caller to decide if it should fallback to
SMBIOSv3 when smbios-entry-point-type='auto'

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-18-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: in case of entry point is 'auto' try to build v2 tables 1st

QEMU for some time now uses SMBIOS 3.0 for PC/Q35 machines by
default, however Windows has a bug in locating SMBIOS 3.0
entrypoint and fails to find tables when booted on SeaBIOS
(on UEFI SMBIOS 3.0 tables work fine since firmware hands
over tables in another way)

Missing SMBIOS tables may lead to some issues for guest
though (worst are: possible reactiveation, inability to
get virtio drivers from 'Windows Update')

It's unclear at this point if MS will fix the issue on their
side. So instead of it (or rather in addition) this patch
will try to workaround the issue.

aka, use smbios-entry-point-type=auto to make QEMU try
generating conservative SMBIOS 2.0 tables and if that
fails (due to limits/requested configuration) fallback
to SMBIOS 3.0 tables.

With this in place majority of users will use SMBIOS 2.0
tables which work fine with (Windows + legacy BIOS).
The configurations that is not to possible to describe
with SMBIOS 2.0 will switch automatically to SMBIOS 3.0
(which will trigger Windows bug but there is nothing
QEMU can do here, so go and aks Microsoft to real fix).

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-17-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: extend smbios-entry-point-type with 'auto' value

later patches will use it to pick SMBIOS version at runtime
depending on configuration.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-16-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: clear smbios_type4_count before building tables

it will help to keep type 4 tables accounting correct in case
SMBIOS tables are built multiple times.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-15-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: get rid of global smbios_ep_type

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-14-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: handle errors consistently

Current code uses mix of error_report()+exit(1)
and error_setg() to handle errors.
Use newer error_setg() everywhere, beside consistency
it will allow to detect error condition without killing
QEMU and attempt switch-over to SMBIOS3.x tables/entrypoint
in follow up patch.

while at it, clear smbios_tables pointer after freeing.
that will avoid double free if smbios_get_tables() is called
multiple times.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-13-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: build legacy mode code only for 'pc' machine

basically moving code around without functional change.
And exposing some symbols so that they could be shared
between smbbios.c and new smbios_legacy.c

plus some meson magic to build smbios_legacy.c only
for 'pc' machine and otherwise replace it with stub
if not selected.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-12-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: rename/expose structures/bitmaps used by both legacy and modern code

As a preparation to move legacy handling into a separate file,
add prefix 'smbios_' to type0/type1/have_binfile_bitmap/have_fields_bitmap
and expose them in smbios.h so that they can be reused in
legacy and modern code.

Doing it as a separate patch to avoid rename cluttering follow-up
patch which will move legacy code into a separate file.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-11-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: add smbios_add_usr_blob_size() helper

it will be used by follow up patch when legacy handling
is moved out into a separate file.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-10-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: don't check type4 structures in legacy mode

legacy mode doesn't support structures of type 2 and more,
and CLI has a check for '-smbios type' option, however it's
still possible to sneak in type4 as a blob with '-smbios file'
option. However doing the later makes SMBIOS tables broken
since SeaBIOS doesn't expect that.

Rather than trying to add support for type4 to legacy code
(both QEMU and SeaBIOS), simplify smbios_get_table_legacy()
by dropping not relevant check in legacy code and error out
on type4 blob.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-9-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: avoid mangling user provided tables

currently smbios_entry_add() preserves internally '-smbios type='
options but tables provided with '-smbios file=' are stored directly
into blob that eventually will be exposed to VM. And then later
QEMU adds default/'-smbios type' entries on top into the same blob.

It makes impossible to generate tables more than once, hence
'immutable' guard was used.
Make it possible to regenerate final blob by storing user provided
blobs into a dedicated area (usr_blobs) and then copy it when
composing final blob. Which also makes handling of -smbios
options consistent.

As side effect of this and previous commits there is no need to
generate legacy smbios_entries at the time options are parsed.
Instead compose smbios_entries on demand from usr_blobs like
it is done for non-legacy SMBIOS tables.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240314152302.2324164-8-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: get rid of smbios_legacy global

clean up smbios_set_defaults() which is reused by legacy
and non legacy machines from being aware of 'legacy' notion
and need to turn it off. And push legacy handling up to
PC machine code where it's relevant.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Acked-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-7-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: get rid of smbios_smp_sockets global

it makes smbios_validate_table() independent from
smbios_smp_sockets global, which in turn lets
smbios_get_tables() avoid using not related legacy code.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-6-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

smbios: cleanup smbios_get_tables() from legacy handling

smbios_get_tables() bails out right away if leagacy mode is enabled
and won't generate any SMBIOS tables. At the same time x86 specific
fw_cfg_build_smbios() will genarate legacy tables and then proceed
to preparing temporary mem_array for useless call to
smbios_get_tables() and then discard it.

Drop legacy related check in smbios_get_tables() and return from
fw_cfg_build_smbios() early if legacy tables where built without
proceeding to non legacy part of the function.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-5-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: smbios: add test for legacy mode CLI options

Unfortunately having 2.0 machine type deprecated is not enough
to get rid of legacy SMBIOS handling since 'isapc' also uses
that and it's staying around.

Hence add test for CLI options handling to be sure that it
ain't broken during SMBIOS code refactoring.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: smbios: add test for -smbios type=11 option

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: smbios: make it possible to write SMBIOS only test

Cureently it not possible to run SMBIOS test without ACPI one,
which gets into the way when testing ACPI-less configs.

Extract SMBIOS testing into separate routines that could also
be run without ACPI dependency and use that for testing SMBIOS.

As the 1st user add "acpi/piix4/smbios-options" test case.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240314152302.2324164-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

docs/specs/pvpanic: mark shutdown event as not implemented

Mention the fact that this event is not yet implemented
to avoid confusion.
As requested by Michael.

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240313-pvpanic-note-v1-1-7f2571cdaedc@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

iotests: adapt to output change for recently introduced 'detached header' field

Failure was noticed when running the tests for the qcow2 image format.

Fixes: 0bd779e27e ("crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20240216101415.293769-1-f.ebner@proxmox.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict tests using "--blockdev file" to the file protocol

Tests that use "--blockdev" with the "file" driver cannot work with
other protocols, so we should mark them accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-10-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Fix some tests that use --image-opts for other protocols

Tests 263, 284 and detect-zeroes-registered-buf use qemu-io
with --image-opts so we have to enforce IMGOPTSSYNTAX=true here
to get $TEST_IMG in shape for other protocols than "file".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-9-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict tests that use --image-opts to the 'file' protocol

These tests 188, 189 and 198 use qemu-io with --image-opts with additional
hard-coded parameters for the file protocol, so they cannot work for other
protocols. Thus we have to limit these tests to the file protocol only.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-8-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict test 156 to the 'file' protocol

The test fails completely when you try to use it with a different
protocol, e.g. with "./check -ssh -qcow2 156".
The test uses some hand-crafted JSON statements which cannot work with other
protocols, thus let's change this test to only support the 'file' protocol.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-7-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict test 134 and 158 to the 'file' protocol

Commit b25b387fa592 updated the iotests 134 and 158 to use the --image-opts
parameter for qemu-io with file protocol related options, but forgot to
update the _supported_proto line accordingly. So let's do that now.

Fixes: b25b387fa5 ("qcow2: convert QCow2 to use QCryptoBlock for encryption")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-6-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict test 130 to the 'file' protocol

Using "-drive ...,backing.file.filename=..." only works with the
file protocol, but not with URIs, so mark this test accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-5-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict test 114 to the 'file' protocol

iotest 114 uses "truncate" and the qcow2.py script on the destination file,
which both cannot deal with URIs. Thus this test needs the "file" protocol,
otherwise it fails with an error message like this:

truncate: cannot open 'ssh://127.0.0.1/tmp/qemu-build/tests/qemu-iotests/scratch/qcow2-ssh-114/t.qcow2.orig'
for writing: No such file or directory

Thus mark this test for "file protocol only" accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-4-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Restrict test 066 to the 'file' protocol

The hand-crafted json statement in this test only works if the test
is run with the "file" protocol, so mark this test accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-3-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Fix test 033 for running with non-file protocols

When running iotest 033 with the ssh protocol, it fails with:

033   fail       [14:48:31] [14:48:41]   10.2s                output mismatch
--- /.../tests/qemu-iotests/033.out
+++ /.../tests/qemu-iotests/scratch/qcow2-ssh-033/033.out.bad
@@ -174,6 +174,7 @@
  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 512/512 bytes at offset 2097152
  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-io: warning: Failed to truncate the tail of the image: ssh driver does not support shrinking files
  read 512/512 bytes at offset 0
  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

We already check for the qcow2 format here, so let's simply also
add a check for the protocol here, too, to only test the truncation
with the file protocol.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240315111108.153201-2-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-img: Fix Column Width and Improve Formatting in snapshot list

When running the command `qemu-img snapshot -l SNAPSHOT` the output of
VM_CLOCK (measures the offset between host and VM clock) cannot to
accommodate values in the order of thousands (4-digit).

This line [1] hints on the problem. Additionally, the column width for
the VM_CLOCK field was reduced from 15 to 13 spaces in commit b39847a5
in line [2], resulting in a shortage of space.

[1]:
https://gitlab.com/qemu-project/qemu/-/blob/master/block/qapi.c?ref_type=heads#L753
[2]:
https://gitlab.com/qemu-project/qemu/-/blob/master/block/qapi.c?ref_type=heads#L763

This patch restores the column width to 15 spaces and makes adjustments
to the affected iotests accordingly. Furthermore, addresses a potential
source
of confusion by removing whitespace in column headers. Example, VM CLOCK
is modified to VM_CLOCK. Additionally a '--' symbol is introduced when
ICOUNT returns no output for clarity.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2062
Fixes: b39847a50553 ("migration: introduce icount field for snapshots")
Signed-off-by: Abhiram Tilak <atp.exp@gmail.com>
Message-ID: <20240123050354.22152-2-atp.exp@gmail.com>
[kwolf: Fixed up qemu-iotests 261 and 286]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockdev: Fix blockdev-snapshot-sync error reporting for no medium

When external_snapshot_abort() rejects a BlockDriverState without a
medium, it creates an error like this:

        error_setg(errp, "Device '%s' has no medium", device);

Trouble is @device can be null.  My system formats null as "(null)",
but other systems might crash.  Reproducer:

1. Create a block device without a medium

    -> {"execute": "blockdev-add", "arguments": {"driver": "host_cdrom", "node-name": "blk0", "filename": "/dev/sr0"}}
    <- {"return": {}}

3. Attempt to snapshot it

    -> {"execute":"blockdev-snapshot-sync", "arguments": { "node-name": "blk0", "snapshot-file":"/tmp/foo.qcow2","format":"qcow2"}}
    <- {"error": {"class": "GenericError", "desc": "Device '(null)' has no medium"}}

Broken when commit 0901f67ecdb made @device optional.

Use bdrv_get_device_or_node_name() instead.  Now it fails as it
should:

    <- {"error": {"class": "GenericError", "desc": "Device 'blk0' has no medium"}}

Fixes: 0901f67ecdb7 ("qmp: Allow to take external snapshots on bs graphs node.")
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240306142831.2514431-1-armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Add test for reset/AioContext switches with NBD exports

This replicates the scenario in which the bug was reported.
Unfortunately this relies on actually executing a guest (so that the
firmware initialises the virtio-blk device and moves it to its
configured iothread), so this can't make use of the qtest accelerator
like most other test cases. I tried to find a different easy way to
trigger the bug, but couldn't find one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240314165825.40261-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

nbd/server: Fix race in draining the export

When draining an NBD export, nbd_drained_begin() first sets
client->quiescing so that nbd_client_receive_next_request() won't start
any new request coroutines. Then nbd_drained_poll() tries to makes sure
that we wait for any existing request coroutines by checking that
client->nb_requests has become 0.

However, there is a small window between creating a new request
coroutine and increasing client->nb_requests. If a coroutine is in this
state, it won't be waited for and drain returns too early.

In the context of switching to a different AioContext, this means that
blk_aio_attached() will see client->recv_coroutine != NULL and fail its
assertion.

Fix this by increasing client->nb_requests immediately when starting the
coroutine. Doing this after the checks if we should create a new
coroutine is okay because client->lock is held.

Cc: qemu-stable@nongnu.org
Fixes: fd6afc501a01 ("nbd/server: Use drained block ops to quiesce the server")
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240314165825.40261-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

mirror: Don't call job_pause_point() under graph lock

Calling job_pause_point() while holding the graph reader lock
potentially results in a deadlock: bdrv_graph_wrlock() first drains
everything, including the mirror job, which pauses it. The job is only
unpaused at the end of the drain section, which is when the graph writer
lock has been successfully taken. However, if the job happens to be
paused at a pause point where it still holds the reader lock, the writer
lock can't be taken as long as the job is still paused.

Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly.

Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-28125
Fixes: 004915a96a7a ("block: Protect bs->backing with graph_lock")
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240313153000.33121-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qapi: document PCIe Gen5/Gen6 speeds since 9.0

Document that PCIe Gen5/Gen6 speeds are only in QAPI
since 9.0 - the rest is since 4.0.

Cc: Lukas Stockner <lstockner@genesiscloud.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Fixes: c08da86dc4 ("pcie: Support PCIe Gen5/Gen6 link speeds")
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

SMBIOS: fix long lines

Break up long lines to fit under 80/90 char limit.

Fixes: 04f143d828 ("Implement SMBIOS type 9 v2.6")
Fixes: 735eee07d1 ("Implement base of SMBIOS type 9 descriptor.")
Cc: "Felix Wu" <flwu@google.com>
Cc: Nabih Estefan <nabihestefan@google.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

migration/multifd: Duplicate the fd for the outgoing_args

We currently store the file descriptor used during the main outgoing
channel creation to use it again when creating the multifd
channels.

Since this fd is used for the first iochannel, there's risk that the
QIOChannel gets freed and the fd closed while outgoing_args.fd still
has it available. This could lead to an fd-reuse bug.

Duplicate the outgoing_args fd to avoid this issue.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240315032040.7974-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

migration/multifd: Ensure we're not given a socket for file migration

When doing migration using the fd: URI, QEMU will fetch the file
descriptor passed in via the monitor at
fd_start_outgoing|incoming_migration(), which means the checks at
migration_channels_and_transport_compatible() happen too soon and we
don't know at that point whether the FD refers to a plain file or a
socket.

For this reason, we've been allowing a migration channel of type
SOCKET_ADDRESS_TYPE_FD to pass the initial verifications in scenarios
where the socket migration is not supported, such as with fd + multifd.

The commit decdc76772 ("migration/multifd: Add mapped-ram support to
fd: URI") was supposed to add a second check prior to starting
migration to make sure a socket fd is not passed instead of a file fd,
but failed to do so.

Add the missing verification and update the comment explaining this
situation which is currently incorrect.

Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240315032040.7974-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

docs/s390: clarify even more that cpu-topology is KVM-only

At least for now cpu-topology is implemented only for KVM.

We already say this, but this tries to be more explicit,
and also show it in the examples.

This adds a new reference in the introduction that we can point to,
whenever we need to reference accelerators and how to select them.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-ID: <20240314172218.16478-1-cfontana@suse.de>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Tested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

migration: Fix iocs leaks during file and fd migration

The memory for the io channels is being leaked in three different ways
during file migration:

1) if the offset check fails we never drop the ioc reference;

2) we allocate an extra channel for no reason;

3) if multifd is enabled but channel creation fails when calling
dup(), we leave the previous channels around along with the glib
polling;

Fix all issues by restructuring the code to first allocate the
channels and only register the watches when all channels have been
created.

For multifd, the file and fd migrations can share code because both
are backed by a QIOChannelFile. For the non-multifd case, the fd needs
to be separate because it is backed by a QIOChannelSocket.

Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support")
Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Reported-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240313212824.16974-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>

Merge tag 'pull-maintainer-final-130324-1' of https://gitlab.com/stsquad/qemu into staging

final updates for 9.0 (testing, gdbstub):

  - fix the over rebuilding of test VMs
  - support Xfer:siginfo:read in gdbstub
  - fix double close() in gdbstub

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmXxkb0ACgkQ+9DbCVqe
# KkSw9wf+K+3kJYaZ2unEFku3Y6f4Z9XkrZCsFQFVNIJQgpYVc6peQyLUB1pZwzZc
# yoQhmTIgej16iRZc7gEcJhFl2zlX2vulE/m+wiaR0Chv3E2r510AGn4aWl+GLB9+
# /WduHaz1NobPW4JWaarxespa84Re8QZQgqkHX4nwYd++FW63E4uxydL4F1nmSNca
# eTA6RwS48h4wqPzHBX72hYTRUnYrDUSSGCGUDzK3NHumuPi+AQ77GLRMO0MTYFfy
# hWriapogCmghY+Xtn++eUIwDyh1CCnUT6Ntf5Qj06bZ+f6eaTwINM8QWhj9mxYX+
# 5/F5Q4JJDqRPYw/hF4wYXRsiZxTYFw==
# =BOWW
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 13 Mar 2024 11:45:01 GMT
# gpg:                using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* tag 'pull-maintainer-final-130324-1' of https://gitlab.com/stsquad/qemu:
  gdbstub: Fix double close() of the follow-fork-mode socket
  tests/tcg: Add multiarch test for Xfer:siginfo:read stub
  gdbstub: Add Xfer:siginfo:read stub
  gdbstub: Save target's siginfo
  linux-user: Move tswap_siginfo out of target code
  gdbstub: Rename back gdb_handlesig
  tests/vm: ensure we build everything by default

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'for_upstream' of https://git./virt/kvm/mst/qemu into staging

virtio,pc,pci: features, cleanups, fixes

more memslots support in libvhost-user
support PCIe Gen5/Gen6 link speeds in pcie
more traces in vdpa
network simulation devices support in vdpa
SMBIOS type 9 descriptor implementation
Bump max_cpus to 4096 vcpus in q35
aw-bits and granule options in VIRTIO-IOMMU
Support report NUMA nodes for device memory using GI in acpi
Beginning of shutdown event support in pvpanic

fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/
# 1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt
# +jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ
# uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr
# 0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G
# 6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg=
# =C0UU
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 22:03:31 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (68 commits)
  docs/specs/pvpanic: document shutdown event
  hw/cxl: Fix missing reserved data in CXL Device DVSEC
  hmat acpi: Fix out of bounds access due to missing use of indirection
  hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.
  qemu-options.hx: Document the virtio-iommu-pci aw-bits option
  hw/arm/virt: Set virtio-iommu aw-bits default value to 48
  hw/i386/q35: Set virtio-iommu aw-bits default value to 39
  virtio-iommu: Add an option to define the input range width
  virtio-iommu: Trace domain range limits as unsigned int
  qemu-options.hx: Document the virtio-iommu-pci granule option
  virtio-iommu: Change the default granule to the host page size
  virtio-iommu: Add a granule property
  hw/i386/acpi-build: Add support for SRAT Generic Initiator structures
  hw/acpi: Implement the SRAT GI affinity structure
  qom: new object to associate device to NUMA node
  hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove it
  hw/i386/pc: Set "normal" boot device order in pc_basic_device_init()
  hw/i386/pc: Avoid one use of the current_machine global
  hw/i386/pc: Remove "rtc_state" link again
  Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/core/machine.c

migration: cpr-reboot documentation

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1710338119-330923-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>

Merge tag 'pull-ppc-for-9.0-2-20240313' of https://gitlab.com/npiggin/qemu into staging

* PAPR nested hypervisor host implementation for spapr TCG
* excp_helper.c code cleanups and improvements
* Move more ops to decodetree
* Deprecate pseries-2.12 machines and P9 and P10 DD1.0 CPUs
* Document running Linux on AmigaNG
* Update dt feature advertising POWER CPUs.
* Add P10 PMU SPRs
* Improve pnv topology calculation for SMT8 CPUs.
* Various bug fixes.

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEETkN92lZhb0MpsKeVZ7MCdqhiHK4FAmXwiT8ACgkQZ7MCdqhi
# HK7C/w//XxEO2bQTFPLFDTrP/voq7pcX8XeQNVyXCkXYjvsbu05oQow50k+Y5UAE
# US4MFjt8jFz0vuIKuKyoA3kG41zDSOzoX4TQXMM+tyTWbuFF3KAyfizb1xE6SYAN
# xJEGvmiXv/EgoSBD7BTKQp1tMPdIGZLwSdYiA0lmOo7YaMCgYAXaujW5hnNjQecT
# 873sN+10pHtQY++mINtD9Nfb6AcDGMWw0b+bykqIXhNRkI8IGOS4WF4vAuMBrwfe
# UM00wDnNRb86Dk14bv2XVNDr6/i0VRtUMwM4yiptrQ1TQx18LZaPSQFYjQfPaan7
# LwN4QkMFnBX54yJ7Npvjvu8BCBF47kwOVu4CIAFJ4sIm0WfTmozDpPttwcZ5w7Ve
# iXDOB9ECAB4pQ2rCgbSNG8MYUZgoHHOuThqolOP0Vh9NHRRJxpdw6CyAbmCGftc0
# lvRDPFiKp8xmCNJ/j3XzoUdHoG7NMwpUmHv9ruGU18SdQ8hyJN9AcQGWYrB4v0RV
# /hs2RAbwntG7ahkcwd8uy5aFw88Wph/uGXPXc49EWj7i49vHeIV2y5+gtthMywje
# qqjFXkistXuF+JHVnyoYmqqCyXaHX5CEwtawMv4EQeaJs76bLhMeMTKKl9rRp8qB
# DtbIZphO8iMsocrBnje48sA5HR0PM+H4HTjw10i8R0fLlWitaIY=
# =XnY5
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 16:56:31 GMT
# gpg:                using RSA key 4E437DDA56616F4329B0A79567B30276A8621CAE
# gpg: Good signature from "Nicholas Piggin <npiggin@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4E43 7DDA 5661 6F43 29B0  A795 67B3 0276 A862 1CAE

* tag 'pull-ppc-for-9.0-2-20240313' of https://gitlab.com/npiggin/qemu: (38 commits)
  spapr: nested: Introduce cap-nested-papr for Nested PAPR API
  spapr: nested: Introduce H_GUEST_RUN_VCPU hcall.
  spapr: nested: Use correct source for parttbl info for nested PAPR API.
  spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls.
  spapr: nested: Initialize the GSB elements lookup table.
  spapr: nested: Extend nested_ppc_state for nested PAPR API
  spapr: nested: Introduce H_GUEST_CREATE_VCPU hcall.
  spapr: nested: Introduce H_GUEST_[CREATE|DELETE] hcalls.
  spapr: nested: Introduce H_GUEST_[GET|SET]_CAPABILITIES hcalls.
  spapr: nested: Document Nested PAPR API
  spapr: nested: keep nested-hv related code restricted to its API.
  spapr: nested: Introduce SpaprMachineStateNested to store related info.
  spapr: nested: move nested part of spapr_get_pate into spapr_nested.c
  spapr: nested: register nested-hv api hcalls only for cap-nested-hv
  target/ppc: Remove interrupt handler wrapper functions
  target/ppc: Clean up ifdefs in excp_helper.c, part 3
  target/ppc: Clean up ifdefs in excp_helper.c, part 2
  target/ppc: Clean up ifdefs in excp_helper.c, part 1
  target/ppc: Add gen_exception_err_nip() function
  target/ppc: Readability improvements in exception handlers
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmXwpoYACgkQnKSrs4Gr
# c8gE0wf/c0hNDKoV01N8IwfJdmIBySNeCYRQiwcR84iiPoGGAwYdKuLa7wHaQKiO
# iM0EV/ltJiiOGCHxlffVqLBzJurJHsHG6m429KBLRBXWc6gVzhCN9TjD8DwHxiTU
# qzczoev8NJ2y5mrxzPPPjMxSSJEe3Ynas6ngeHeYBUtu0PRNp79zceWdtS0sPzia
# sCI8EH/oCZQgVcwI/UkIOXjzbKK1lZWa2805//KIqvG27i9zHzLJ0l5eeLtbpZpy
# LnFGRyQGGf+jEKAJuT6598q6T+jCkLCMN6zpyKWGvcYleNvBnlw6+N8Il8zV7KSc
# TE5BNk+C7I9aimrRyaz3WrFCZW5DbQ==
# =q9Im
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 19:01:26 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu:
  meson: generate .stp files for tools too
  tracetool: remove redundant --target-type / --target-name args

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

gdbstub: Fix double close() of the follow-fork-mode socket

When the terminal GDB_FORK_ENABLED state is reached, the coordination
socket is not needed anymore and is therefore closed. However, if there
is a communication error between QEMU gdbstub and GDB, the generic
error handling code attempts to close it again.

Fix by closing it later - before returning - instead.

Fixes: Coverity CID 1539966
Fixes: d547e711a8a5 ("gdbstub: Implement follow-fork-mode child")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240312001813.13720-1-iii@linux.ibm.com>

tests/tcg: Add multiarch test for Xfer:siginfo:read stub

Add multiarch test for testing if Xfer:siginfo:read query is properly
handled by gdbstub.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240309030901.1726211-6-gustavo.romero@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

gdbstub: Add Xfer:siginfo:read stub

Add stub to handle Xfer:siginfo:read packet query that requests the
machine's siginfo data.

This is used when GDB user executes 'print $_siginfo' and when the
machine stops due to a signal, for instance, on SIGSEGV. The information
in siginfo allows GDB to determiner further details on the signal, like
the fault address/insn when the SIGSEGV is caught.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Message-Id: <20240309030901.1726211-5-gustavo.romero@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

gdbstub: Save target's siginfo

Save target's siginfo into gdbserver_state so it can be used later, for
example, in any stub that requires the target's si_signo and si_code.

This change affects only linux-user mode.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240309030901.1726211-4-gustavo.romero@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

linux-user: Move tswap_siginfo out of target code

Move tswap_siginfo from target code to handle_pending_signal. This will
allow some cleanups and having the siginfo ready to be used in gdbstub.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240309030901.1726211-3-gustavo.romero@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

gdbstub: Rename back gdb_handlesig

Rename gdb_handlesig_reason back to gdb_handlesig. There is no need to
add a wrapper for gdb_handlesig and rename it when a new parameter is
added.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240309030901.1726211-2-gustavo.romero@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

tests/vm: ensure we build everything by default

The "check" target by itself is not enough to ensure we build the user
mode binaries. While we can't test them with check-tcg we can at least
include them in the build.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Gustavo Romero <gustavo.romero@linaro.org>

migration: Skip only empty block devices

The block .save_setup() handler calls a helper routine
init_blk_migration() which builds a list of block devices to take into
account for migration. When one device is found to be empty (sectors
== 0), the loop exits and all the remaining devices are ignored. This
is a regression introduced when bdrv_iterate() was removed.

Change that by skipping only empty devices.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()")
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Link: https://lore.kernel.org/r/20240312120431.550054-1-clg@redhat.com
[peterx: fix "Suggested-by:"]
Signed-off-by: Peter Xu <peterx@redhat.com>

docs/specs/pvpanic: document shutdown event

Shutdown requests are normally hardware dependent.
By extending pvpanic to also handle shutdown requests, guests can
submit such requests with an easily implementable and cross-platform
mechanism.

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240310-pvpanic-shutdown-spec-v1-1-b258e182ce55@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl: Fix missing reserved data in CXL Device DVSEC

The r3.1 specification introduced a new 2 byte field, but
to maintain DWORD alignment, a additional 2 reserved bytes
were added. Forgot those in updating the structure definition
but did include them in the size define leading to a buffer
overrun.

Also use the define so that we don't duplicate the value.

Fixes: Coverity ID 1534095 buffer overrun
Fixes: 8700ee15de ("hw/cxl: Standardize all references on CXL r3.1 and minor updates")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240308143831.6256-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hmat acpi: Fix out of bounds access due to missing use of indirection

With a numa set up such as

-numa nodeid=0,cpus=0 \
-numa nodeid=1,memdev=mem \
-numa nodeid=2,cpus=1

and appropriate hmat_lb entries the initiator list is correctly
computed and writen to HMAT as 0,2 but then the LB data is accessed
using the node id (here 2), landing outside the entry_list array.

Stash the reverse lookup when writing the initiator list and use
it to get the correct array index index.

Fixes: 4586a2cb83 ("hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240307160326.31570-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.

If qemu is started with a proximity node containing CPUs alone,
it will provide one of these structures to say memory in this
node is directly connected to itself.

This description is arguably pointless even if there is memory
in the node. If there is no memory present, and hence no SRAT
entry it breaks Linux HMAT passing and the table is rejected.

https://elixir.bootlin.com/linux/v6.7/source/drivers/acpi/numa/hmat.c#L444

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240307160326.31570-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

qemu-options.hx: Document the virtio-iommu-pci aw-bits option

Document the new aw-bits option.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240307134445.92296-10-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/arm/virt: Set virtio-iommu aw-bits default value to 48

On ARM we set 48b as a default (matching SMMUv3 SMMU_IDR5.VAX == 0).

hw_compat_8_2 is used to handle the compatibility for machine types
before 9.0 (default was 64 bits).

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <Zhenzhong.duan@intel.com>
Message-Id: <20240307134445.92296-9-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386/q35: Set virtio-iommu aw-bits default value to 39

Currently the default input range can extend to 64 bits. On x86,
when the virtio-iommu protects vfio devices, the physical iommu
may support only 39 bits. Let's set the default to 39, as done
for the intel-iommu.

We use hw_compat_8_2 to handle the compatibility for machines
before 9.0 which used to have a virtio-iommu default input range
of 64 bits.

Of course if aw-bits is set from the command line, the default
is overriden.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240307134445.92296-8-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

virtio-iommu: Add an option to define the input range width

aw-bits is a new option that allows to set the bit width of
the input address range. This value will be used as a default for
the device config input_range.end. By default it is set to 64 bits
which is the current value.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20240307134445.92296-7-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Trace domain range limits as unsigned int

Use %u format to trace domain_range limits.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20240307134445.92296-6-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

qemu-options.hx: Document the virtio-iommu-pci granule option

We are missing an entry for the virtio-iommu-pci device. Add the
information on which machine it is currently supported and document
the new granule option.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240307134445.92296-5-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

virtio-iommu: Change the default granule to the host page size

We used to set the default granule to 4KB but with VFIO assignment
it makes more sense to use the actual host page size.

Indeed when hotplugging a VFIO device protected by a virtio-iommu
on a 64kB/64kB host/guest config, we current get a qemu crash:

"vfio: DMA mapping failed, unable to continue"

This is due to the hot-attached VFIO device calling
memory_region_iommu_set_page_size_mask() with 64kB granule
whereas the virtio-iommu granule was already frozen to 4KB on
machine init done.

Set the granule property to "host" and introduce a new compat.
The page size mask used before 9.0 was qemu_target_page_mask().
Since the virtio-iommu currently only supports x86_64 and aarch64,
this matched a 4KB granule.

Note that the new default will prevent 4kB guest on 64kB host
because the granule will be set to 64kB which would be larger
than the guest page size. In that situation, the virtio-iommu
driver fails on viommu_domain_finalise() with
"granule 0x10000 larger than system page size 0x1000".

In that case the workaround is to request 4K granule.

The current limitation of global granule in the virtio-iommu
should be removed and turned into per domain granule. But
until we get this upgraded, this new default is probably
better because I don't think anyone is currently interested in
running a 4KB page size guest with virtio-iommu on a 64KB host.
However supporting 64kB guest on 64kB host with virtio-iommu and
VFIO looks a more important feature.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20240307134445.92296-4-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Add a granule property

This allows to choose which granule will be used by
default by the virtio-iommu. Current page size mask
default is qemu_target_page_mask so this translates
into a 4k granule on ARM and x86_64 where virtio-iommu
is supported.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20240307134445.92296-3-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386/acpi-build: Add support for SRAT Generic Initiator structures

The acpi-generic-initiator object is added to allow a host device
to be linked with a NUMA node. Qemu use it to build the SRAT
Generic Initiator Affinity structure [1]. Add support for i386.

[1] ACPI Spec 6.3, Section 5.2.16.6

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-4-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

hw/acpi: Implement the SRAT GI affinity structure

ACPI spec provides a scheme to associate "Generic Initiators" [1]
(e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines GPUs) with Proximity Domains. This is
achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
node for each unique PXM ID encountered. Qemu currently do not implement
these structures while building SRAT.

Add GI structures while building VM ACPI SRAT. The association between
device and node are stored using acpi-generic-initiator object. Lookup
presence of all such objects and use them to build these structures.

The structure needs a PCI device handle [2] that consists of the device BDF.
The vfio-pci device corresponding to the acpi-generic-initiator object is
located to determine the BDF.

[1] ACPI Spec 6.3, Section 5.2.16.6
[2] ACPI Spec 6.3, Table 5.80

Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cedric Le Goater <clg@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-3-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

qom: new object to associate device to NUMA node

NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.

Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator (GI) Affinity structures that allows
association between nodes and devices. Multiple GI structures per BDF is
possible, allowing creation of multiple nodes by exposing unique PXM in each
of these structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.

When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.

Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201   1]                       Length : 20

[0CAh 0202   1]                    Reserved1 : 00
[0CBh 0203   1]           Device Handle Type : 01
[0CCh 0204   4]             Proximity Domain : 00000007
[0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
[0E4h 0228   4]                    Reserved2 : 00000000

[0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233   1]                       Length : 20

An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.

For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:

-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \

-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \

Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-2-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove it

Now that pc_cmos_init() doesn't populate the X86MachineState::rtc attribute any
longer, its duties can be merged into pc_cmos_init_late() which is called within
machine_done notifier. This frees pc_piix and pc_q35 from explicit CMOS
initialization.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240303185332.1408-5-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386/pc: Set "normal" boot device order in pc_basic_device_init()

The boot device order may change during the lifetime of a VM. Usually, the
"normal" order is set once during machine init(). However, if a user specifies
`-boot once=...`, the "normal" order is overwritten by the "once" order just
before machine_done, and a reset handler is registered which restores the
"normal" order during the next reset.

In the next patch, pc_cmos_init() will be inlined into pc_cmos_init_late() which
runs during machine_done. This means that the "once" boot order would be
overwritten again with the "normal" boot order -- which renders the user's
choice ineffective. Fix this by setting the "normal" boot order in
pc_basic_device_init() which already registers the boot_set() handler.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240303185332.1408-4-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386/pc: Avoid one use of the current_machine global

The RTC can be accessed through the X86 machine instance, so rather than passing
the RTC it's possible to pass the machine state instead. This avoids
pc_boot_set() from having to access the current_machine global.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240303185332.1408-3-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/i386/pc: Remove "rtc_state" link again

Commit 99e1c1137b6f "hw/i386/pc: Populate RTC attribute directly" made linking
the "rtc_state" property unnecessary and removed it. Commit 84e945aad2d0 "vl,
pc: turn -no-fd-bootchk into a machine property" accidently reintroduced the
link. Remove it again since it is not needed.

Fixes: 84e945aad2d0 "vl, pc: turn -no-fd-bootchk into a machine property"
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240303185332.1408-2-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"

Specifying the property `-M pflash0` results in a regression:
qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
Revert the change for now until a solution is found.

This reverts commit 6f6ad2b24582593d8feb00434ce2396840666227.

Reported-by: Volker Rümelin <vr_qemu@t-online.de>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240226215909.30884-3-shentey@gmail.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Revert "hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove it"

Commit 6f6ad2b24582 "hw/i386/pc: Confine system flash handling to pc_sysfw"
causes a regression when specifying the property `-M pflash0` in the PCI PC
machines:
qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
In order to revert the commit, the commit below must be reverted first.

This reverts commit cb05cc16029bb0a61ac5279ab7b3b90dcf2aa69f.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240226215909.30884-2-shentey@gmail.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pc: q35: Bump max_cpus to 4096 vcpus

Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up to 4096 vCPUs")
Linux kernel can support upto a maximum number of 4096 vcpus when MAXSMP is
enabled in the kernel. At present, QEMU has been tested to correctly boot a
linux guest with 4096 vcpus using the current edk2 upstream master branch that
has the fixes corresponding to the following two PRs:

https://github.com/tianocore/edk2/pull/5410
https://github.com/tianocore/edk2/pull/5418

The changes merged into edk2 with the above PRs will be in the upcoming 2024-05
release. With current seabios firmware, it boots fine with 4096 vcpus already.
So bump up the value max_cpus to 4096 for q35 machines versions 9 and newer.
Q35 machines versions 8.2 and older continue to support 1024 maximum vcpus
as before for compatibility reasons.

If KVM is not able to support the specified number of vcpus, QEMU would
return the following error messages:

$ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728
qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested (1728) exceeds the recommended cpus supported by KVM (12)
qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus requested (1728) exceeds the recommended cpus supported by KVM (12)
Number of SMP cpus requested (1728) exceeds the maximum cpus supported by KVM (1024)

Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Julia Suvorova <jusual@redhat.com>
Cc: kraxel@redhat.com
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20240228143351.3967-1-anisinha@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pci: Always call pcie_sriov_pf_reset()

Call pcie_sriov_pf_reset() from pci_do_device_reset() just as we do
for msi_reset() and msix_reset() to prevent duplicating code for each
SR-IOV PF.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-5-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>

pcie_sriov: Do not reset NumVFs after disabling VFs

The spec does not NumVFs is reset after disabling VFs except when
resetting the PF. Clearing it is guest visible and out of spec, even
though Linux doesn't rely on this value being preserved, so we never
noticed.

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-4-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pcie_sriov: Reset SR-IOV extended capability

pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only
disables VFs and does not reset SR-IOV extended capability, leaking the
state and making the VF Enable register inconsistent with the actual
state.

Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which
does not only disable VFs but also resets the capability.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-3-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>

pcie_sriov: Validate NumVFs

The guest may write NumVFs greater than TotalVFs and that can lead
to buffer overflow in VF implementations.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26327
Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-2-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>