Peter Maydell [Mon, 4 Nov 2024 12:31:45 +0000 (12:31 +0000)]
Merge tag 'migration-
20241030-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull request for softfreeze
v2:
- Patch "migration: Move cpu-throttle.c from system to migration",
fix build on MacOS, and subject spelling
NOTE: checkpatch.pl could report a false positive on this branch:
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21:
{include/sysemu => migration}/cpu-throttle.h | 0
That's covered by "F: migration/" entry.
Changelog:
- Peter's cleanup patch on migrate_fd_cleanup()
- Peter's cleanup patch to introduce thread name macros
- Hanna's error path fix for vmstate subsection save()s
- Hyman's auto converge enhancement on background dirty sync
- Peter's additional tracepoints for save state entries
- Thomas's build fix for OpenBSD in dirtyrate.c
- Peter's deprecation of query-migrationthreads command
- Peter's cleanup/fixes from the "export misc.h" series
- Maciej's two small patches from multifd+vfio series
# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZyTbVRIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wan3wD+L4TVNDc34Hy4mvWu7u1lCOePX0GBdUEc
# oEeBGblwbrcBAIR8d+5z9O5YcWH1coozG1aUC4qCtSHHk5TGbJk4/UUD
# =XB5Q
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 01 Nov 2024 13:44:53 GMT
# gpg: using EDDSA key
B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg: issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal]
# gpg: aka "Peter Xu <peterx@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-
20241030-pull-request' of https://gitlab.com/peterx/qemu:
migration/multifd: Zero p->flags before starting filling a packet
migration/ram: Add load start trace event
migration: Drop migration_is_idle()
migration: Drop migration_is_setup_or_active()
migration: Unexport ram_mig_init()
migration: Unexport dirty_bitmap_mig_init()
migration: Take migration object refcount earlier for threads
migration: Deprecate query-migrationthreads command
migration/dirtyrate: Silence warning about strcpy() on OpenBSD
tests/migration: Add case for periodic ramblock dirty sync
migration: Support periodic RAMBlock dirty bitmap sync
migration: Remove "rs" parameter in migration_bitmap_sync_precopy
migration: Move cpu-throttle.c from system to migration
migration: Stop CPU throttling conditionally
accel/tcg/icount-common: Remove the reference to the unused header file
migration: Ensure vmstate_save() sets errp
migration: Put thread names together with macros
migration: Cleanup migrate_fd_cleanup() on accessing to_dst_file
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Sat, 2 Nov 2024 16:21:38 +0000 (16:21 +0000)]
Merge tag 'for-upstream-i386' of https://gitlab.com/bonzini/qemu into staging
* target/i386: new feature bits for AMD processors
* target/i386/tcg: improvements around flag handling
* target/i386: add AVX10 support
* target/i386: add GraniteRapids-v2 model
* dockerfiles: add libcbor
* New nitro-enclave machine type
* qom: cleanups to object_new
* configure: detect 64-bit MIPS for rust
* configure: deprecate 32-bit MIPS
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcjvkQUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroPIKgf/etNpO2T+eLFtWN/Qd5eopBXqNd9k
# KmeK9EgW9lqx2IPGNen33O+uKpb/TsMmubSsSF+YxTp7pmkc8+71f3rBMaIAD02r
# /paHSMVw0+f12DAFQz1jdvGihR7Mew0wcF/UdEt737y6vEmPxLTyYG3Gfa4NSZwT
# /V5jTOIcfUN/UEjNgIp6NTuOEESKmlqt22pfMapgkwMlAJYeeJU2X9eGYE86wJbq
# ZSXNgK3jL9wGT2XKa3e+OKzHfFpSkrB0JbQbdico9pefnBokN/hTeeUJ81wBAc7u
# i00W1CEQVJ5lhBc121d4AWMp83ME6HijJUOTMmJbFIONPsITFPHK1CAkng==
# =D4nR
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 31 Oct 2024 17:28:36 GMT
# gpg: using RSA key
F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream-i386' of https://gitlab.com/bonzini/qemu: (49 commits)
target/i386: Introduce GraniteRapids-v2 model
target/i386: Add AVX512 state when AVX10 is supported
target/i386: Add feature dependencies for AVX10
target/i386: add CPUID.24 features for AVX10
target/i386: add AVX10 feature and AVX10 version property
target/i386: return bool from x86_cpu_filter_features
target/i386: do not rely on ExtSaveArea for accelerator-supported XCR0 bits
target/i386: cpu: set correct supported XCR0 features for TCG
target/i386: use + to put flags together
target/i386: use higher-precision arithmetic to compute CF
target/i386: use compiler builtin to compute PF
target/i386: make flag variables unsigned
target/i386: add a note about gen_jcc1
target/i386: add a few more trivial CCPrepare cases
target/i386: optimize TEST+Jxx sequences
target/i386: optimize computation of ZF from CC_OP_DYNAMIC
target/i386: Wrap cc_op_live with a validity check
target/i386: Introduce cc_op_size
target/i386: Rearrange CCOp
target/i386: remove CC_OP_CLR
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Maciej S. Szmigiero [Tue, 29 Oct 2024 14:58:15 +0000 (15:58 +0100)]
migration/multifd: Zero p->flags before starting filling a packet
This way there aren't stale flags there.
p->flags can't contain SYNC to be sent at the next RAM packet since syncs
are now handled separately in multifd_send_thread.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/1c96b6cdb797e6f035eb1a4ad9bfc24f4c7f5df8.1730203967.git.maciej.szmigiero@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Maciej S. Szmigiero [Tue, 29 Oct 2024 14:58:14 +0000 (15:58 +0100)]
migration/ram: Add load start trace event
There's a RAM load complete trace event but there wasn't its start equivalent.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/94ddfa7ecb83a78f73b82867dd30c8767592d257.1730203967.git.maciej.szmigiero@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 24 Oct 2024 21:30:53 +0000 (17:30 -0400)]
migration: Drop migration_is_idle()
Now with the current migration_is_running(), it will report exactly the
opposite of what will be reported by migration_is_idle().
Drop migration_is_idle(), instead use "!migration_is_running()" which
should be identical on functionality.
In reality, most of the idle check is inverted, so it's even easier to
write with "migrate_is_running()" check.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 24 Oct 2024 21:30:52 +0000 (17:30 -0400)]
migration: Drop migration_is_setup_or_active()
This helper is mostly the same as migration_is_running(), except that one
has COLO reported as true, the other has CANCELLING reported as true.
Per my past years experience on the state changes, none of them should
matter.
To make it slightly safer, report both COLO || CANCELLING to be true in
migration_is_running(), then drop the other one. We kept the 1st only
because the name is simpler, and clear enough.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 24 Oct 2024 21:30:51 +0000 (17:30 -0400)]
migration: Unexport ram_mig_init()
It's only used within migration/.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 24 Oct 2024 21:30:50 +0000 (17:30 -0400)]
migration: Unexport dirty_bitmap_mig_init()
It's only used within migration/, so it shouldn't be exported.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 24 Oct 2024 21:30:49 +0000 (17:30 -0400)]
migration: Take migration object refcount earlier for threads
Both migration thread or background snapshot thread will take a refcount of
the migration object at the entrace of the thread function.
That makes sense, because it protects the object from being freed by the
main thread in migration_shutdown() later, but it might still race with it
if the thread is scheduled too late. Consider the case right after
pthread_create() happened, VM shuts down with the object released, but
right after that the migration thread finally got created, referencing
MigrationState* in the opaque pointer which is already freed.
The only 100% safe way to make sure it won't get freed is taking the
refcount right before the thread is created, meanwhile when BQL is held.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Tue, 22 Oct 2024 19:45:01 +0000 (15:45 -0400)]
migration: Deprecate query-migrationthreads command
Per previous discussion [1,2], this patch deprecates query-migrationthreads
command.
To summarize, the major reason of the deprecation is due to no sensible way
to consume the API properly:
(1) The reported list of threads are incomplete (ignoring destination
threads and non-multifd threads).
(2) For CPU pinning, there's no way to properly pin the threads with
the API if the threads will start running right away after migration
threads can be queried, so the threads will always run on the default
cores for a short window.
(3) For VM debugging, one can use "-name $VM,debug-threads=on" instead,
which will provide proper names for all migration threads.
[1] https://lore.kernel.org/r/
20240930195837.825728-1-peterx@redhat.com
[2] https://lore.kernel.org/r/
20241011153417.516715-1-peterx@redhat.com
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20241022194501.1022443-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Thomas Huth [Tue, 22 Oct 2024 06:34:02 +0000 (08:34 +0200)]
migration/dirtyrate: Silence warning about strcpy() on OpenBSD
The linker on OpenBSD complains:
ld: warning: dirtyrate.c:447 (../src/migration/dirtyrate.c:447)(...):
warning: strcpy() is almost always misused, please use strlcpy()
It's currently not a real problem in this case since both arrays
have the same size (256 bytes). But just in case somebody changes
the size of the source array in the future, let's better play safe
and use g_strlcpy() here instead, with an additional check that the
string has been copied as a whole.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Link: https://lore.kernel.org/r/20241022063402.184213-1-thuth@redhat.com
[peterx: Fix over-80 chars]
Signed-off-by: Peter Xu <peterx@redhat.com>
Hyman Huang [Thu, 17 Oct 2024 06:42:55 +0000 (14:42 +0800)]
tests/migration: Add case for periodic ramblock dirty sync
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/cb61504f1a1e9d5f2ca4dac12e518deb076ce9f3.1729146786.git.yong.huang@smartx.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Hyman Huang [Thu, 17 Oct 2024 06:42:54 +0000 (14:42 +0800)]
migration: Support periodic RAMBlock dirty bitmap sync
When VM is configured with huge memory, the current throttle logic
doesn't look like to scale, because migration_trigger_throttle()
is only called for each iteration, so it won't be invoked for a long
time if one iteration can take a long time.
The periodic dirty sync aims to fix the above issue by synchronizing
the ramblock from remote dirty bitmap and, when necessary, triggering
the CPU throttle multiple times during a long iteration.
This is a trade-off between synchronization overhead and CPU throttle
impact.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com
[peterx: make prev_cnt global, and reset for each migration]
Signed-off-by: Peter Xu <peterx@redhat.com>
Hyman Huang [Thu, 17 Oct 2024 06:42:53 +0000 (14:42 +0800)]
migration: Remove "rs" parameter in migration_bitmap_sync_precopy
The global static variable ram_state in fact is referred to by the
"rs" parameter in migration_bitmap_sync_precopy. For ease of calling
by the callees, use the global variable directly in
migration_bitmap_sync_precopy and remove "rs" parameter.
The migration_bitmap_sync_precopy will be exported in the next commit.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/283c335d61463bf477160da91b24da45cdaf3e43.1729146786.git.yong.huang@smartx.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Hyman Huang [Thu, 17 Oct 2024 06:42:52 +0000 (14:42 +0800)]
migration: Move cpu-throttle.c from system to migration
Move cpu-throttle.c from system to migration since it's
only used for migration; this makes us avoid exporting the
util functions and variables in misc.h but export them in
migration.h when implementing the periodic ramblock dirty
sync feature in the upcoming commits.
Since CPU throttle timers are only used in migration, move
their registry to migration_object_init.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com
[peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet]
[peterx: Fix subject spelling, per pm215]
Signed-off-by: Peter Xu <peterx@redhat.com>
Hyman Huang [Thu, 17 Oct 2024 06:42:51 +0000 (14:42 +0800)]
migration: Stop CPU throttling conditionally
Since CPU throttling only occurs when auto-converge
is on, stop it conditionally.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/f0c787080bb9ab0c37952f0ca5bfaa525d5ddd14.1729146786.git.yong.huang@smartx.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Hyman Huang [Thu, 17 Oct 2024 06:42:50 +0000 (14:42 +0800)]
accel/tcg/icount-common: Remove the reference to the unused header file
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/5e33b423d0b8506e5cb33fff42b50aa301b7731b.1729146786.git.yong.huang@smartx.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Hanna Czenczek [Tue, 15 Oct 2024 17:04:37 +0000 (19:04 +0200)]
migration: Ensure vmstate_save() sets errp
migration/savevm.c contains some calls to vmstate_save() that are
followed by migrate_set_error() if the integer return value indicates an
error. migrate_set_error() requires that the `Error *` object passed to
it is set. Therefore, vmstate_save() is assumed to always set *errp on
error.
Right now, that assumption is not met: vmstate_save_state_v() (called
internally by vmstate_save()) will not set *errp if
vmstate_subsection_save() or vmsd->post_save() fail. Fix that by adding
an *errp parameter to vmstate_subsection_save(), and by generating a
generic error in case post_save() fails (as is already done for
pre_save()).
Without this patch, qemu will crash after vmstate_subsection_save() or
post_save() have failed inside of a vmstate_save() call (unless
migrate_set_error() then happen to discard the new error because
s->error is already set). This happens e.g. when receiving the state
from a virtio-fs back-end (virtiofsd) fails.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Link: https://lore.kernel.org/r/20241015170437.310358-1-hreitz@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Fri, 11 Oct 2024 15:36:52 +0000 (11:36 -0400)]
migration: Put thread names together with macros
Keep migration thread names together, so it's easier to see a list of all
possible migration threads.
Still two functional changes below besides the macro defintions:
- There's one dirty rate thread that we overlooked before, now we add
that too and name it as "mig/dirtyrate" following the old rules.
- The old name "mig/src/rp-thr" has "-thr" but it may not be useful if
it's a thread name anyway, while "rp" can be slightly hard to read.
Taking this chance to rename it to "mig/src/return", hopefully a better
name.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 19 Sep 2024 16:30:42 +0000 (12:30 -0400)]
migration: Cleanup migrate_fd_cleanup() on accessing to_dst_file
The cleanup function can in many cases needs cleanup on its own.
The major thing we want to do here is not referencing to_dst_file when
without the file mutex. When at it, touch things elsewhere too to make it
look slightly better in general.
One thing to mention is, migration_thread has its own "running" boolean, so
it doesn't need to rely on to_dst_file being non-NULL. Multifd has a
dependency so it needs to be skipped if to_dst_file is not yet set; add a
richer comment for such reason.
Resolves: Coverity CID
1527402
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240919163042.116767-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Tao Su [Thu, 31 Oct 2024 08:52:33 +0000 (16:52 +0800)]
target/i386: Introduce GraniteRapids-v2 model
Update GraniteRapids CPU model to add AVX10 and the missing features(ss,
tsc-adjust, cldemote, movdiri, movdir64b).
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-7-tao1.su@linux.intel.com
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241031085233.425388-9-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tao Su [Thu, 31 Oct 2024 08:52:32 +0000 (16:52 +0800)]
target/i386: Add AVX512 state when AVX10 is supported
AVX10 state enumeration in CPUID leaf D and enabling in XCR0 register
are identical to AVX512 state regardless of the supported vector lengths.
Given that some E-cores will support AVX10 but not support AVX512, add
AVX512 state components to guest when AVX10 is enabled.
Based on a patch by Tao Su <tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-8-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tao Su [Thu, 31 Oct 2024 08:52:31 +0000 (16:52 +0800)]
target/i386: Add feature dependencies for AVX10
Since the highest supported vector length for a processor implies that
all lesser vector lengths are also supported, add the dependencies of
the supported vector lengths. If all vector lengths aren't supported,
clear AVX10 enable bit as well.
Note that the order of AVX10 related dependencies should be kept as:
CPUID_24_0_EBX_AVX10_128 -> CPUID_24_0_EBX_AVX10_256,
CPUID_24_0_EBX_AVX10_256 -> CPUID_24_0_EBX_AVX10_512,
CPUID_24_0_EBX_AVX10_VL_MASK -> CPUID_7_1_EDX_AVX10,
CPUID_7_1_EDX_AVX10 -> CPUID_24_0_EBX,
so that prevent user from setting weird CPUID combinations, e.g. 256-bits
and 512-bits are supported but 128-bits is not, no vector lengths are
supported but AVX10 enable bit is still set.
Since AVX10_128 will be reserved as 1, adding these dependencies has the
bonus that when user sets -cpu host,-avx10-128, CPUID_7_1_EDX_AVX10 and
CPUID_24_0_EBX will be disabled automatically.
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-5-tao1.su@linux.intel.com
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241031085233.425388-7-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tao Su [Thu, 31 Oct 2024 08:52:30 +0000 (16:52 +0800)]
target/i386: add CPUID.24 features for AVX10
Introduce features for the supported vector bit lengths.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com
Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-6-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tao Su [Thu, 31 Oct 2024 08:52:29 +0000 (16:52 +0800)]
target/i386: add AVX10 feature and AVX10 version property
When AVX10 enable bit is set, the 0x24 leaf will be present as "AVX10
Converged Vector ISA leaf" containing fields for the version number and
the supported vector bit lengths.
Introduce avx10-version property so that avx10 version can be controlled
by user and cpu model. Per spec, avx10 version can never be 0, the default
value of avx10-version is set to 0 to determine whether it is specified by
user. The default can come from the device model or, for the max model,
from KVM's reported value.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com
Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-5-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 31 Oct 2024 08:52:28 +0000 (16:52 +0800)]
target/i386: return bool from x86_cpu_filter_features
Prepare for filtering non-boolean features such as AVX10 version.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 31 Oct 2024 08:52:27 +0000 (16:52 +0800)]
target/i386: do not rely on ExtSaveArea for accelerator-supported XCR0 bits
Right now, QEMU is using the "feature" and "bits" fields of ExtSaveArea
to query the accelerator for the support status of extended save areas.
This is a problem for AVX10, which attaches two feature bits (AVX512F
and AVX10) to the same extended save states.
To keep the AVX10 hacks to the minimum, limit usage of esa->features
and esa->bits. Instead, just query the accelerator for the 0xD leaf.
Do it in common code and clear esa->size if an extended save state is
unsupported.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-3-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 31 Oct 2024 08:52:26 +0000 (16:52 +0800)]
target/i386: cpu: set correct supported XCR0 features for TCG
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-2-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 31 May 2024 09:32:19 +0000 (11:32 +0200)]
target/i386: use + to put flags together
This gives greater opportunity for reassociation on x86 targets,
since addition can use the LEA instruction.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 31 May 2024 09:00:33 +0000 (11:00 +0200)]
target/i386: use higher-precision arithmetic to compute CF
If the operands of the arithmetic instruction fit within a half-register,
it's easiest to use a comparison instruction to compute the carry.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 31 May 2024 08:52:42 +0000 (10:52 +0200)]
target/i386: use compiler builtin to compute PF
This removes the 256 byte parity table from the executable.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 31 May 2024 09:41:50 +0000 (11:41 +0200)]
target/i386: make flag variables unsigned
This makes it easier for the compiler to understand which bits are set,
and it also removes "cltq" instructions to canonicalize the output value
as 32-bit signed.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 14 Aug 2024 11:44:47 +0000 (13:44 +0200)]
target/i386: add a note about gen_jcc1
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 1 Jul 2024 19:11:07 +0000 (21:11 +0200)]
target/i386: add a few more trivial CCPrepare cases
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 20 Jun 2024 09:31:33 +0000 (11:31 +0200)]
target/i386: optimize TEST+Jxx sequences
Mostly used for TEST+JG and TEST+JLE, but it is easy to cover
also JBE/JA and JL/JGE; shaves about 0.5% TCG ops.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 20 Jun 2024 08:34:28 +0000 (10:34 +0200)]
target/i386: optimize computation of ZF from CC_OP_DYNAMIC
Most uses of CC_OP_DYNAMIC are for CMP/JB/JE or similar sequences.
We can optimize many of them to avoid computation of the flags.
This eliminates both TCG ops to set up the new cc_op, and helper
instructions because evaluating just ZF is much cheaper.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Richard Henderson [Mon, 1 Jul 2024 09:08:50 +0000 (11:08 +0200)]
target/i386: Wrap cc_op_live with a validity check
Assert that op is known and that cc_op_live_ is populated.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Richard Henderson [Mon, 15 Jul 2024 12:34:29 +0000 (14:34 +0200)]
target/i386: Introduce cc_op_size
Replace arithmetic on cc_op with a helper function.
Assert that the op has a size and that it is valid
for the configuration.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-6-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Richard Henderson [Mon, 15 Jul 2024 12:31:56 +0000 (14:31 +0200)]
target/i386: Rearrange CCOp
Give the first few enumerators explicit integer constants,
align the BWLQ enumerators.
This will be used to simplify ((op - CC_OP_*B) & 3).
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-4-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 20 Jun 2024 08:16:47 +0000 (10:16 +0200)]
target/i386: remove CC_OP_CLR
Just use CC_OP_EFLAGS; it is not that likely that the flags computed by
CC_OP_CLR survive the end of the basic block, in which case there is no
need to spill cc_op_src.
cc_op_src now does need spilling if the XOR is followed by a memory
operation, but this only costs 0.2% extra TCG ops. They will be recouped
by simplifications in how QEMU evaluates ZF at runtime, which are even
greater with this change.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Richard Henderson [Mon, 1 Jul 2024 02:51:11 +0000 (19:51 -0700)]
target/i386: Tidy cc_op_str usage
Make const. Use the read-only strings directly; do not copy
them into an on-stack buffer with snprintf. Allow for holes
in the cc_op_str array, now present with CC_OP_POPCNT.
Fixes: 460231ad369 ("target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-2-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 3 Sep 2024 07:50:00 +0000 (09:50 +0200)]
target/i386: use tcg_gen_ext_tl when applicable
Prefer it to gen_ext_tl in the common case where the destination is known.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 31 Oct 2024 14:09:52 +0000 (15:09 +0100)]
ci: always invoke meson through pyvenv
Do not assume that the distro-installed meson is compatible with the one
in the virtual environment.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dorjoy Chowdhury [Tue, 8 Oct 2024 21:17:27 +0000 (03:17 +0600)]
docs/nitro-enclave: Documentation for nitro-enclave machine type
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-7-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dorjoy Chowdhury [Tue, 8 Oct 2024 21:17:26 +0000 (03:17 +0600)]
machine/nitro-enclave: New machine type for AWS Nitro Enclaves
AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating
isolated execution environments, called enclaves, from Amazon EC2
instances which are used for processing highly sensitive data. Enclaves
have no persistent storage and no external networking. The enclave VMs
are based on the Firecracker microvm with a vhost-vsock device for
communication with the parent EC2 instance that spawned it and a Nitro
Secure Module (NSM) device for cryptographic attestation. The parent
instance VM always has CID 3 while the enclave VM gets a dynamic CID.
An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave
virtual machine. This commit adds support for AWS nitro enclave emulation
using a new machine type option '-M nitro-enclave'. This new machine type
is based on the 'microvm' machine type, similar to how real nitro enclave
VMs are based on Firecracker microvm. For nitro-enclave to boot from an
EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel
and a temporary initrd file which are then hooked into the regular x86
boot mechanism along with the extracted cmdline. The EIF file path should
be provided using the '-kernel' QEMU option.
In QEMU, the vsock emulation for nitro enclave is added using vhost-user-
vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM
communication which is needed for nitro enclaves. So for the vsock
communication to CID 3 to work, another process that does the vsock
emulation in userspace must be run, for example, vhost-device-vsock[4]
from rust-vmm, with necessary vsock communication support in another
guest VM with CID 3. Using vhost-user-vsock also enables the possibility
to implement some proxying support in the vhost-user-vsock daemon that
will forward all the packets to the host machine instead of CID 3 so
that users of nitro-enclave can run the necessary applications in their
host machine instead of running another whole VM with CID 3. The following
mandatory nitro-enclave machine option has been added related to the
vhost-user-vsock device.
- 'vsock': The chardev id from the '-chardev' option for the
vhost-user-vsock device.
AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
has been added using the virtio-nsm device added in a previous commit.
In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the VM
for validation. The following optional nitro-enclave machine options
have been added related to the NSM device.
- 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
- 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
- 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.
[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://aws.amazon.com/ec2/
[3] https://github.com/aws/aws-nitro-enclaves-image-format
[4] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-6-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dorjoy Chowdhury [Tue, 8 Oct 2024 21:17:25 +0000 (03:17 +0600)]
core/machine: Make create_default_memdev machine a virtual method
This is in preparation for the next commit where the nitro-enclave
machine type will need to instead use a memfd backend, for the built-in
vhost-user-vsock device to work.
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-5-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dorjoy Chowdhury [Tue, 8 Oct 2024 21:17:24 +0000 (03:17 +0600)]
hw/core: Add Enclave Image Format (EIF) related helpers
An EIF (Enclave Image Format)[1] file is used to boot an AWS nitro
enclave[2] virtual machine. The EIF file contains the necessary kernel,
cmdline, ramdisk(s) sections to boot.
Some helper functions have been introduced for extracting the necessary
sections from an EIF file and then writing them to temporary files as
well as computing SHA384 hashes from the section data. These will be
used in the following commit to add support for nitro-enclave machine
type in QEMU.
The files added in this commit are not compiled yet but will be added
to the hw/core/meson.build file in the following commit where
CONFIG_NITRO_ENCLAVE will be introduced.
[1] https://github.com/aws/aws-nitro-enclaves-image-format
[2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-4-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dorjoy Chowdhury [Tue, 8 Oct 2024 21:17:23 +0000 (03:17 +0600)]
device/virtio-nsm: Support for Nitro Secure Module device
Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2]
for stripped down TPM functionality like cryptographic attestation.
The requests to and responses from NSM device are CBOR[3] encoded.
This commit adds support for NSM device in QEMU. Although related to
AWS Nitro Enclaves, the virito-nsm device is independent and can be
used in other machine types as well. The libcbor[4] library has been
used for the CBOR encoding and decoding functionalities.
[1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
[2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[3] http://cbor.io/
[4] https://libcbor.readthedocs.io/en/latest/
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-3-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dorjoy Chowdhury [Tue, 8 Oct 2024 21:17:22 +0000 (03:17 +0600)]
tests/lcitool: Update libvirt-ci and add libcbor dependency
libcbor dependecy is necessary for adding virtio-nsm and nitro-enclave
machine support in the following commits. libvirt-ci has already been
updated with the dependency upstream and this commit updates libvirt-ci
submodule in QEMU to latest upstream. Also the libcbor dependency has
been added to tests/lcitool/projects/qemu.yml.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-2-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 30 Oct 2024 15:31:26 +0000 (16:31 +0100)]
target/i386/hvf: fix handling of XSAVE-related CPUID bits
The call to xgetbv() is passing the ecx value for cpuid function 0xD,
index 0. The xgetbv call thus returns false (OSXSAVE is bit 27, which is
well out of the range of CPUID[0xD,0].ECX) and eax is not modified. While
fixing it, cache the whole computation of supported XCR0 bits since it
will be used for more than just CPUID leaf 0xD.
Furthermore, unsupported subleafs of CPUID 0xD (including all those
corresponding to zero bits in host's XCR0) must be hidden; if OSXSAVE
is not set at all, the whole of CPUID leaf 0xD plus the XSAVE bit must
be hidden.
Finally, unconditionally drop XSTATE_BNDREGS_MASK and XSTATE_BNDCSR_MASK;
real hardware will only show them if the MPX bit is set in CPUID;
this is never the case for hvf_get_supported_cpuid() because QEMU's
Hypervisor.framework support does not handle the VMX fields related to
MPX (even in the unlikely possibility that the host has MPX enabled).
So hide those bits in the new cache_host_xcr0().
Cc: Phil Dennis-Jordan <lists@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Babu Moger [Thu, 24 Oct 2024 22:18:24 +0000 (17:18 -0500)]
target/i386: Expose new feature bits in CPUID 8000_0021_EAX/EBX
Newer AMD CPUs support ERAPS (Enhanced Return Address Prediction Security)
feature that enables the auto-clear of RSB entries on a TLB flush, context
switches and VMEXITs. The number of default RSP entries is reflected in
RapSize.
Add the feature bit and feature word to support these features.
CPUID_Fn80000021_EAX
Bits Feature Description
24 ERAPS:
Indicates support for enhanced return address predictor security.
CPUID_Fn80000021_EBX
Bits Feature Description
31-24 Reserved
23:16 RapSize:
Return Address Predictor size. RapSize x 8 is the minimum number
of CALL instructions software needs to execute to flush the RAP.
15-00 MicrocodePatchSize. Read-only.
Reports the size of the Microcode patch in 16-byte multiples.
If 0, the size of the patch is at most 5568 (15C0h) bytes.
Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/7c62371fe60af1e9bbd853f5f8e949bf2d908bd0.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Babu Moger [Thu, 24 Oct 2024 22:18:23 +0000 (17:18 -0500)]
target/i386: Expose bits related to SRSO vulnerability
Add following bits related Speculative Return Stack Overflow (SRSO).
Guests can make use of these bits if supported.
These bits are reported via CPUID Fn8000_0021_EAX.
===================================================================
Bit Feature Description
===================================================================
27 SBPB Indicates support for the Selective Branch Predictor Barrier.
28 IBPB_BRTYPE MSR_PRED_CMD[IBPB] flushes all branch type predictions.
29 SRSO_NO Not vulnerable to SRSO.
30 SRSO_USER_KERNEL_NO Not vulnerable to SRSO at the user-kernel boundary.
===================================================================
Link: https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/dadbd70c38f4e165418d193918a3747bd715c5f4.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sandipan Das [Thu, 24 Oct 2024 22:18:21 +0000 (17:18 -0500)]
target/i386: Add PerfMonV2 feature bit
CPUID leaf 0x80000022, i.e. ExtPerfMonAndDbg, advertises new performance
monitoring features for AMD processors. Bit 0 of EAX indicates support
for Performance Monitoring Version 2 (PerfMonV2) features. If found to
be set during PMU initialization, the EBX bits can be used to determine
the number of available counters for different PMUs. It also denotes the
availability of global control and status registers.
Add the required CPUID feature word and feature bit to allow guests to
make use of the PerfMonV2 features.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/a96f00ee2637674c63c61e9fc4dee343ea818053.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Babu Moger [Thu, 24 Oct 2024 22:18:19 +0000 (17:18 -0500)]
target/i386: Fix minor typo in NO_NESTED_DATA_BP feature bit
Rename CPUID_8000_0021_EAX_No_NESTED_DATA_BP to
CPUID_8000_0021_EAX_NO_NESTED_DATA_BP.
No functional change intended.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/a6749acd125670d3930f4ca31736a91b1d965f2f.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Oct 2024 09:46:38 +0000 (10:46 +0100)]
qom: allow user-creatable classes to be in modules
There is no real reason to make user-creatable classes different
from other backends in this respect. This also allows modularized
character devices to be treated by qom-list-properties just like
builtin ones.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Oct 2024 09:41:58 +0000 (10:41 +0100)]
qom: let object_new use a module if the type is not present
object_initialize() can use modules (it was added there because
virtio-gpu-device is a child device of virtio-gpu-pci; commit
64f7aece8ea, "object_initialize: try module load", 2020-09-15).
object_new() cannot; make things consistent.
qdev_new() is now just a simple wrapper that returns DeviceState.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Oct 2024 09:31:30 +0000 (10:31 +0100)]
qom: centralize module-loading functionality
Put together the common code of object_initialize() and
module_object_class_by_name() into a function that supports
Error **. Rename the existing function type_get_by_name() to
clarify that it will only look at defined types; this is often
okay within object.c to look at the parents, but not outside it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Oct 2024 09:45:51 +0000 (10:45 +0100)]
qom: use object_new_with_class when possible
A small optimization/code simplification, that also makes it clear that
we won't look for a type in a not-loaded-yet module---the module will
have been loaded by a call to module_object_class_by_name(), if present.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Oct 2024 09:20:46 +0000 (10:20 +0100)]
qom: remove unused function
The function has been unused since commit
4fa28f23906 ("ppc/pnv:
Instantiate cores separately", 2019-12-17). The idea was that
you could use it to build an array of objects via pointer
arithmetic, but no one is doing it anymore.
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiaoyao Li [Sun, 29 Sep 2024 08:57:47 +0000 (04:57 -0400)]
i386/cpu: Drop the check of phys_bits in host_cpu_realizefn()
The check of cpu->phys_bits to be in range between
[32, TARGET_PHYS_ADDR_SPACE_BITS] in host_cpu_realizefn()
is duplicated with check in x86_cpu_realizefn().
Since the ckeck in x86_cpu_realizefn() is called later and can cover all
the x86 cases. Remove the one in host_cpu_realizefn().
Opportunistically adjust cpu->phys_bits directly in
host_cpu_adjust_phys_bits(), which matches more with the function name.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20240929085747.2023198-1-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Oct 2024 09:43:16 +0000 (10:43 +0100)]
accel: remove dead statement and useless assertion
ops is assigned again just below, and the result of the assignment must
be non-NULL.
Originally, the check for NULL was meant to be a check for the existence
of the ops class:
ops = ACCEL_OPS_CLASS(object_class_by_name(ops_name));
...
g_assert(ops != NULL);
(where the ops assignment begot the one that I am removing); but this is
meaningless now that oc is checked to be non-NULL before ops is assigned
(commit
5141e9a23fc, "accel: abort if we fail to load the accelerator
plugin", 2022-11-06).
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhao Liu [Tue, 22 Oct 2024 02:36:28 +0000 (10:36 +0800)]
MAINTAINERS: Add myself as a reviewer of x86 general architecture support
X86 architecture has always been a focus of my work. I would like to
help to review more related patches.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241022023628.1743686-1-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 27 Oct 2024 13:07:01 +0000 (14:07 +0100)]
configure, meson: deprecate 32-bit MIPS
The mipsel architecture is not available in Debian Trixie, and it will
likely be a hard failure as soon as we drop support for the old Rust
toolchain in Debian Bookworm. Prepare by deprecating 32-bit little
endian MIPS in QEMU 9.2.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 27 Oct 2024 13:04:10 +0000 (14:04 +0100)]
configure: detect 64-bit MIPS
While right now 64-bit MIPS and 32-bit MIPS share the code in QEMU,
Rust uses different rules for the target. Set $cpu correctly to
either mips or mips64 (--cpu=mips64* is already accepted in the case
statement that canonicalizes cpu/host_arch/linux_arch), and adjust
the checks to account for the different between $cpu (which handles
mips/mips64 separately) and $host_arch (which does not).
Fixes: 1a6ef6ff624 ("configure, meson: detect Rust toolchain", 2024-10-11)
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Thu, 31 Oct 2024 16:34:25 +0000 (16:34 +0000)]
Merge tag 'pull-riscv-to-apply-
20241031-1' of https://github.com/alistair23/qemu into staging
RISC-V PR for 9.2
* Fix an access to VXSAT
* Expose RV32 cpu to RV64 QEMU
* Don't clear PLIC pending bits on IRQ lowering
* Make PLIC zeroth priority register read-only
* Set vtype.vill on CPU reset
* Check and update APLIC pending when write sourcecfg
* Avoid dropping charecters with HTIF
* Apply FIFO backpressure to guests using SiFive UART
* Support for control flow integrity extensions
* Support for the IOMMU with the virt machine
* set 'aia_mode' to default in error path
* clarify how 'riscv-aia' default works
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmci/tQACgkQr3yVEwxT
# gBNPAQ//dZKjjJm4Sh+UFdUslivBJYtL1rl2UUG2UqiNn/UoYh/vcHoSArljHTjt
# 8riEStnaQqXziOpMIJjIMLJ4KoiIk2SMvjNfFtcmPiPZEDEpjsTxfUxBFsBee+fI
# 4KNQKKFeljq4pa+VzVvXEqzCNJIzCThFXTZhZmer00M91HPA8ZQIHpv2JL1sWlgZ
# /HW24XEDFLGc/JsR55fxpPftlAqP+BfOrqMmbWy7x2Y+G8WI05hM2zTP/W8pnIz3
# z0GCRYSBlADtrp+3RqzTwQfK5pXoFc0iDktWVYlhoXaeEmOwo8IYxTjrvBGhnBq+
# ySX1DzTa23QmOIxSYYvCRuOxyOK9ziNn+EQ9FiFBt1h1o251CYMil1bwmYXMCMNJ
# rZwF1HfUx0g2GQW1ZOqh1eeyLO29JiOdV3hxlDO7X4bbISNgU6il5MXmnvf0/XVW
# Af3YhALeeDbHgHL1iVfjafzaviQc9+YrEX13eX6N2AjcgE5a3F7XNmGfFpFJ+mfQ
# CPgiwVBXat6UpBUGAt14UM+6wzp+crSgQR5IEGth+mKMKdkWoykvo7A2oHdu39zn
# 2cdzsshg2qcLLUPTFy06OOTXX382kCWXuykhHOjZ4uu2SJJ7R0W3PlYV8HSde2Vu
# Rj+89ZlUSICJNXXweQB39r87hNbtRuDIO22V0B9XrApQbJj6/yE=
# =rPaa
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 31 Oct 2024 03:51:48 GMT
# gpg: using RSA key
6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013
* tag 'pull-riscv-to-apply-
20241031-1' of https://github.com/alistair23/qemu: (50 commits)
target/riscv: Fix vcompress with rvv_ta_all_1s
target/riscv/kvm: clarify how 'riscv-aia' default works
target/riscv/kvm: set 'aia_mode' to default in error path
docs/specs: add riscv-iommu
qtest/riscv-iommu-test: add init queues test
hw/riscv/riscv-iommu: add DBG support
hw/riscv/riscv-iommu: add ATS support
hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
test/qtest: add riscv-iommu-pci tests
hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
hw/riscv: add riscv-iommu-pci reference device
pci-ids.rst: add Red Hat pci-id for RISC-V IOMMU device
hw/riscv: add RISC-V IOMMU base emulation
hw/riscv: add riscv-iommu-bits.h
exec/memtxattr: add process identifier to the transaction attributes
target/riscv: Expose zicfiss extension as a cpu property
disas/riscv: enable disassembly for compressed sspush/sspopchk
disas/riscv: enable disassembly for zicfiss instructions
target/riscv: compressed encodings for sspush and sspopchk
target/riscv: implement zicfiss instructions
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 31 Oct 2024 13:28:57 +0000 (13:28 +0000)]
Merge tag 'pull-target-arm-
20241029' of https://git.linaro.org/people/pmaydell/qemu-arm into staging
target-arm queue:
* arm/kvm: add support for MTE
* docs/system/cpu-hotplug: Update example's socket-id/core-id
* target/arm: Store FPSR cumulative exception bits in env->vfp.fpsr
* target/arm: Don't assert in regime_is_user() for E10 mmuidx values
* hw/sd/omap_mmc: Fix breakage of OMAP MMC controller
* tests/functional: Add functional tests for collie, sx1
* scripts/symlink-install-tree.py: Fix MESONINTROSPECT parsing
* docs/system/arm: Document remaining undocumented boards
* target/arm: Fix arithmetic underflow in SETM instruction
* docs/devel/reset: Fix minor grammatical error
* target/arm: kvm: require KVM_CAP_DEVICE_CTRL
# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmcg+oYZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3g/KD/4tzAD2zkWpnIPhY5ht4wBz
# Kioy+pnXJW5I6pAS4ljnI41pOFnPr6Ln1NfGkP+9pTND8lIQNY0Te2a/NjgEiYJc
# rYJ/A6UUuCqQ8+/oWWMPETcbbiKcSS2mzCJ/pNXeIquK5Co0Qk7mzdfObudwZpbw
# o3Cc9YrGZc64XAl2Rb83Oy2UHo1xjmV67wtEmcj+hmWC+tFc7pQpAKwIKcBMgns8
# ZILexX18RYZMDqQZQ5tvwTccJeFmljj9PyScou787RXK93BlF3sL/ypq1xMykRru
# JpMwAI6jD5LG9NO2zNr3FpBef8sJXqNF+O0DcYmhrKBwRkztuEU6DXF6xzdz/HRa
# c14hWK1jHku+HvKBXx3c5wibTbTU71Jv36Gw5VjOBQe/5cdKJAbZw8OH+IK8ozk9
# GwLVQ/JzrIi5m8FwXPwmkOPLX/CY8Wot6IWdJKKGTN8bY+9Cu2gTduFJIvi96HWU
# xkG1ySN61wKUR8Z26mizim2nBvQjybjqKEhrtQ21K548j4pWFVBgXJQX0Menca/v
# ziSLCd84Pmh9+DtElPCUyau/nX/jyUJ1gCScvcJjF5jAMPBREpAh53j/GL9JEgX6
# 9cX2WG6o+9R4Qcrh1O3Vy1bAUcJ27Tr2NitD+g5XObZ+vC6YgqfN2/M53so4rwws
# N4KCRdV6GcU70bQAul3mLQ==
# =KWM2
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 29 Oct 2024 15:08:54 GMT
# gpg: using RSA key
E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# gpg: aka "Peter Maydell <peter@archaic.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* tag 'pull-target-arm-
20241029' of https://git.linaro.org/people/pmaydell/qemu-arm:
target/arm: kvm: require KVM_CAP_DEVICE_CTRL
docs/devel/reset: Fix minor grammatical error
target/arm: Fix arithmetic underflow in SETM instruction
docs/system/target-arm.rst: Remove "many boards are undocumented" note
docs/system/arm: Add placeholder docs for mcimx6ul-evk and mcimx7d-sabre
docs/system/arm: Add placeholder doc for xlnx-zcu102 board
docs/system/arm: Add placeholder doc for exynos4 boards
docs/system/arm: Split fby35 out from aspeed.rst
docs/system/arm: Don't use wildcard '*-bmc' in doc titles
docs/system/arm/stm32: List olimex-stm32-h405 in document title
scripts/symlink-install-tree.py: Fix MESONINTROSPECT parsing
tests/functional: Add a functional test for the sx1 board
tests/functional: Add a functional test for the collie board
hw/sd/omap_mmc: Don't use sd_cmd_type_t
target/arm: Don't assert in regime_is_user() for E10 mmuidx values
target/arm: Store FPSR cumulative exception bits in env->vfp.fpsr
docs/system/cpu-hotplug: Update example's socket-id/core-id
arm/kvm: add support for MTE
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Anton Blanchard [Wed, 30 Oct 2024 04:35:38 +0000 (15:35 +1100)]
target/riscv: Fix vcompress with rvv_ta_all_1s
vcompress packs vl or less fields into vd, so the tail starts after the
last packed field. This could be more clearly expressed in the ISA,
but for now this thread helps to explain it:
https://github.com/riscv/riscv-v-spec/issues/796
Signed-off-by: Anton Blanchard <antonb@tenstorrent.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241030043538.939712-1-antonb@tenstorrent.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Daniel Henrique Barboza [Mon, 28 Oct 2024 18:20:37 +0000 (15:20 -0300)]
target/riscv/kvm: clarify how 'riscv-aia' default works
We do not have control in the default 'riscv-aia' default value. We can
try to set it to a specific value, in this case 'auto', but there's no
guarantee that the host will accept it.
Couple with this we're always doing a 'qemu_log' to inform whether we're
ended up using the host default or if we managed to set the AIA mode to
the QEMU default we wanted to set.
Change the 'riscv-aia' description to better reflect how the option
works, and remove the two informative 'qemu_log' that are now unneeded:
if no message shows, riscv-aia was set to the default or uset-set value.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241028182037.290171-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Daniel Henrique Barboza [Mon, 28 Oct 2024 18:20:36 +0000 (15:20 -0300)]
target/riscv/kvm: set 'aia_mode' to default in error path
When failing to set the selected AIA mode, 'aia_mode' is left untouched.
This means that 'aia_mode' will not reflect the actual AIA mode,
retrieved in 'default_aia_mode',
This is benign for now, but it will impact QMP query commands that will
expose the 'aia_mode' value, retrieving the wrong value.
Set 'aia_mode' to 'default_aia_mode' if we fail to change the AIA mode
in KVM.
While we're at it, rework the log/warning messages to be a bit less
verbose. Instead of:
KVM AIA: default mode is emul
qemu-system-riscv64: warning: KVM AIA: failed to set KVM AIA mode
We can use a single warning message:
qemu-system-riscv64: warning: KVM AIA: failed to set KVM AIA mode 'auto', using default host mode 'emul'
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241028182037.290171-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Daniel Henrique Barboza [Wed, 16 Oct 2024 20:40:36 +0000 (17:40 -0300)]
docs/specs: add riscv-iommu
Add a simple guideline to use the existing RISC-V IOMMU support we just
added.
This doc will be updated once we add the riscv-iommu-sys device.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-13-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Daniel Henrique Barboza [Wed, 16 Oct 2024 20:40:35 +0000 (17:40 -0300)]
qtest/riscv-iommu-test: add init queues test
Add an additional test to further exercise the IOMMU where we attempt to
initialize the command, fault and page-request queues.
These steps are taken from chapter 6.2 of the RISC-V IOMMU spec,
"Guidelines for initialization". It emulates what we expect from the
software/OS when initializing the IOMMU.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-12-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:34 +0000 (17:40 -0300)]
hw/riscv/riscv-iommu: add DBG support
DBG support adds three additional registers: tr_req_iova, tr_req_ctl and
tr_response.
The DBG cap is always enabled. No on/off toggle is provided for it.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-11-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:33 +0000 (17:40 -0300)]
hw/riscv/riscv-iommu: add ATS support
Add PCIe Address Translation Services (ATS) capabilities to the IOMMU.
This will add support for ATS translation requests in Fault/Event
queues, Page-request queue and IOATC invalidations.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-10-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:32 +0000 (17:40 -0300)]
hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
The RISC-V IOMMU spec predicts that the IOMMU can use translation caches
to hold entries from the DDT. This includes implementation for all cache
commands that are marked as 'not implemented'.
There are some artifacts included in the cache that predicts s-stage and
g-stage elements, although we don't support it yet. We'll introduce them
next.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-9-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Daniel Henrique Barboza [Wed, 16 Oct 2024 20:40:31 +0000 (17:40 -0300)]
test/qtest: add riscv-iommu-pci tests
To test the RISC-V IOMMU emulation we'll use its PCI representation.
Create a new 'riscv-iommu-pci' libqos device that will be present with
CONFIG_RISCV_IOMMU. This config is only available for RISC-V, so this
device will only be consumed by the RISC-V libqos machine.
Start with basic tests: a PCI sanity check and a reset state register
test. The reset test was taken from the RISC-V IOMMU spec chapter 5.2,
"Reset behavior".
More tests will be added later.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-8-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:30 +0000 (17:40 -0300)]
hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
Generate device tree entry for riscv-iommu PCI device, along with
mapping all PCI device identifiers to the single IOMMU device instance.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-7-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:29 +0000 (17:40 -0300)]
hw/riscv: add riscv-iommu-pci reference device
The RISC-V IOMMU can be modelled as a PCIe device following the
guidelines of the RISC-V IOMMU spec, chapter 7.1, "Integrating an IOMMU
as a PCIe device".
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-6-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Daniel Henrique Barboza [Wed, 16 Oct 2024 20:40:28 +0000 (17:40 -0300)]
pci-ids.rst: add Red Hat pci-id for RISC-V IOMMU device
The RISC-V IOMMU PCI device we're going to add next is a reference
implementation of the riscv-iommu spec [1], which predicts that the
IOMMU can be implemented as a PCIe device.
However, RISC-V International (RVI), the entity that ratified the
riscv-iommu spec, didn't bother assigning a PCI ID for this IOMMU PCIe
implementation that the spec predicts. This puts us in an uncommon
situation because we want to add the reference IOMMU PCIe implementation
but we don't have a PCI ID for it.
Given that RVI doesn't provide a PCI ID for it we reached out to Red Hat
and Gerd Hoffman, and they were kind enough to give us a PCI ID for the
RISC-V IOMMU PCI reference device.
Thanks Red Hat and Gerd for this RISC-V IOMMU PCIe device ID.
[1] https://github.com/riscv-non-isa/riscv-iommu/releases/tag/v1.0.0
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <
20241016204038.649340-5-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:27 +0000 (17:40 -0300)]
hw/riscv: add RISC-V IOMMU base emulation
The RISC-V IOMMU specification is now ratified as-per the RISC-V
international process. The latest frozen specifcation can be found at:
https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
Add the foundation of the device emulation for RISC-V IOMMU. It includes
support for s-stage (sv32, sv39, sv48, sv57 caps) and g-stage (sv32x4,
sv39x4, sv48x4, sv57x4 caps).
Other capabilities like ATS and DBG support will be added incrementally
in the next patches.
Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Jason Chien <jason.chien@sifive.com>
Message-ID: <
20241016204038.649340-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:26 +0000 (17:40 -0300)]
hw/riscv: add riscv-iommu-bits.h
This header will be used by the RISC-V IOMMU emulation to be added
in the next patch. Due to its size it's being sent in separate for
an easier review.
One thing to notice is that this header can be replaced by the future
Linux RISC-V IOMMU driver header, which would become a linux-header we
would import instead of keeping our own. The Linux implementation isn't
upstream yet so for now we'll have to manage riscv-iommu-bits.h.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241016204038.649340-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Tomasz Jeznach [Wed, 16 Oct 2024 20:40:25 +0000 (17:40 -0300)]
exec/memtxattr: add process identifier to the transaction attributes
Extend memory transaction attributes with process identifier to allow
per-request address translation logic to use requester_id / process_id
to identify memory mapping (e.g. enabling IOMMU w/ PASID translations).
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <
20241016204038.649340-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:10 +0000 (15:50 -0700)]
target/riscv: Expose zicfiss extension as a cpu property
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-21-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:09 +0000 (15:50 -0700)]
disas/riscv: enable disassembly for compressed sspush/sspopchk
sspush and sspopchk have equivalent compressed encoding taken from zcmop.
cmop.1 is sspush x1 while cmop.5 is sspopchk x5. Due to unusual encoding
for both rs1 and rs2 from space bitfield, this required a new codec.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-20-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:08 +0000 (15:50 -0700)]
disas/riscv: enable disassembly for zicfiss instructions
Enable disassembly for sspush, sspopchk, ssrdp & ssamoswap.
Disasembly is only enabled if zimop and zicfiss ext is set to true.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-19-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Pierrick Bouvier [Wed, 23 Oct 2024 07:39:14 +0000 (00:39 -0700)]
scripts: remove erroneous file that breaks git clone on Windows
This file was created by mistake in recent
ed7667188 (9p: remove
'proxy' filesystem backend driver).
When cloning the repository using native git for windows, we see this:
Error: error: invalid path 'scripts/meson-buildoptions.'
Error: The process 'C:\Program Files\Git\bin\git.exe' failed with exit code 128
Link: https://lore.kernel.org/r/20241023073914.895438-1-pierrick.bouvier@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 21 Oct 2024 06:59:03 +0000 (08:59 +0200)]
target/i386: fix CPUID check for LFENCE and SFENCE
LFENCE and SFENCE were introduced with the original SSE instruction set;
marking them incorrectly as cpuid(SSE2) causes failures for CPU models
that lack SSE2, for example pentium3.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Daniel P. Berrangé [Tue, 15 Oct 2024 13:39:25 +0000 (14:39 +0100)]
ci: enable rust in the Fedora system build job
We previously added a new job running Fedora with nightly rust
toolchain.
The standard rust toolchain distributed by Fedora is new enough,
however, to let us enable a CI build with that too.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20241015133925.311587-3-berrange@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Daniel P. Berrangé [Tue, 15 Oct 2024 13:39:24 +0000 (14:39 +0100)]
tests: add 'rust' and 'bindgen' to CI package list
Although we're not enabling rust by default yet, we can still add
rust and bindgen to the CI package list.
This demonstrates that we're not accidentally triggering unexpected
build behaviour merely from Rust being present. When we do dev work
to enable rust by default, this will show we're building correctly
on all platforms we target.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20241015133925.311587-2-berrange@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 21 Oct 2024 16:07:16 +0000 (18:07 +0200)]
stubs: avoid duplicate symbols in libqemuutil.a
qapi_event_send_device_deleted is always included (together with the
rest of QAPI) in libqemuutil.a if either system-mode emulation or tools
are being built, and in that case the stub causes a duplicate symbol
to appear in libqemuutil.a.
Add the symbol only if events are not being requested.
Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:07 +0000 (15:50 -0700)]
target/riscv: compressed encodings for sspush and sspopchk
sspush/sspopchk have compressed encodings carved out of zcmops.
compressed sspush is designated as c.mop.1 while compressed sspopchk
is designated as c.mop.5.
Note that c.sspush x1 exists while c.sspush x5 doesn't. Similarly
c.sspopchk x5 exists while c.sspopchk x1 doesn't.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-18-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:06 +0000 (15:50 -0700)]
target/riscv: implement zicfiss instructions
zicfiss has following instructions
- sspopchk: pops a value from shadow stack and compares with x1/x5.
If they dont match, reports a sw check exception with tval = 3.
- sspush: pushes value in x1/x5 on shadow stack
- ssrdp: reads current shadow stack
- ssamoswap: swaps contents of shadow stack atomically
sspopchk/sspush/ssrdp default to zimop if zimop implemented and SSE=0
If SSE=0, ssamoswap is illegal instruction exception.
This patch implements shadow stack operations for qemu-user and shadow
stack is not protected.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-17-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:05 +0000 (15:50 -0700)]
target/riscv: update `decode_save_opc` to store extra word2
Extra word 2 is stored during tcg compile and `decode_save_opc` needs
additional argument in order to pass the value. This will be used during
unwind to get extra information about instruction like how to massage
exceptions. Updated all callsites as well.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/594
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-16-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:04 +0000 (15:50 -0700)]
target/riscv: AMO operations always raise store/AMO fault
This patch adds one more word for tcg compile which can be obtained during
unwind time to determine fault type for original operation (example AMO).
Depending on that, fault can be promoted to store/AMO fault.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-15-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:03 +0000 (15:50 -0700)]
target/riscv: mmu changes for zicfiss shadow stack protection
zicfiss protects shadow stack using new page table encodings PTE.W=1,
PTE.R=0 and PTE.X=0. This encoding is reserved if zicfiss is not
implemented or if shadow stack are not enabled.
Loads on shadow stack memory are allowed while stores to shadow stack
memory leads to access faults. Shadow stack accesses to RO memory
leads to store page fault.
To implement special nature of shadow stack memory where only selected
stores (shadow stack stores from sspush) have to be allowed while rest
of regular stores disallowed, new MMU TLB index is created for shadow
stack.
Furthermore, `check_zicbom_access` (`cbo.clean/flush/inval`) may probe
shadow stack memory and must always raise store/AMO access fault because
it has store semantics. For non-shadow stack memory even though
`cbo.clean/flush/inval` have store semantics, it will not fault if read
is allowed (probably to follow `clflush` on x86). Although if read is not
allowed, eventually `probe_write` will do store page (or access) fault (if
permissions don't allow it). cbo operations on shadow stack memory must
always raise store access fault. Thus extending `get_physical_address` to
recieve `probe` parameter as well.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-14-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:02 +0000 (15:50 -0700)]
target/riscv: tb flag for shadow stack instructions
Shadow stack instructions can be decoded as zimop / zcmop or shadow stack
instructions depending on whether shadow stack are enabled at current
privilege. This requires a TB flag so that correct TB generation and correct
TB lookup happens. `DisasContext` gets a field indicating whether bcfi is
enabled or not.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-13-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:01 +0000 (15:50 -0700)]
target/riscv: introduce ssp and enabling controls for zicfiss
zicfiss introduces a new state ssp ("shadow stack register") in cpu.
ssp is expressed as a new unprivileged csr (CSR_SSP=0x11) and holds
virtual address for shadow stack as programmed by software.
Shadow stack (for each mode) is enabled via bit3 in *envcfg CSRs.
Shadow stack can be enabled for a mode only if it's higher privileged
mode had it enabled for itself. M mode doesn't need enabling control,
it's always available if extension is available on cpu.
This patch also implements helper bcfi function which determines if bcfi
is enabled at current privilege or not.
Adds ssp to migration state as well.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-12-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:50:00 +0000 (15:50 -0700)]
target/riscv: Add zicfiss extension
zicfiss [1] riscv cpu extension enables backward control flow integrity.
This patch sets up space for zicfiss extension in cpuconfig. And imple-
ments dependency on A, zicsr, zimop and zcmop extensions.
[1] - https://github.com/riscv/riscv-cfi
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-11-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:49:59 +0000 (15:49 -0700)]
target/riscv: Expose zicfilp extension as a cpu property
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-10-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:49:58 +0000 (15:49 -0700)]
disas/riscv: enable `lpad` disassembly
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-9-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Deepak Gupta [Tue, 8 Oct 2024 22:49:57 +0000 (15:49 -0700)]
target/riscv: zicfilp `lpad` impl and branch tracking
Implements setting lp expected when `jalr` is encountered and implements
`lpad` instruction of zicfilp. `lpad` instruction is taken out of
auipc x0, <imm_20>. This is an existing HINTNOP space. If `lpad` is
target of an indirect branch, cpu checks for 20 bit value in x7 upper
with 20 bit value embedded in `lpad`. If they don't match, cpu raises a
sw check exception with tval = 2.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Jim Shu <jim.shu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <
20241008225010.
1861630-8-debug@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>