git.maquefel.me Git - qemu.git/log

target/ppc: Make divw[u] handler method decodetree compatible.

The handler methods for divw[u] instructions internally use Rc(ctx->opcode),
for extraction of Rc field of instructions, which poses a problem if we move
the above said instructions to decodetree, as the ctx->opcode field is not
popluated in decodetree. Hence, making it decodetree compatible, so that the
mentioned insns can be safely move to decodetree specs.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Move mul{li, lw, lwo, hw, hwu} instructions to decodetree.

Moving the following instructions to decodetree specification :
mulli : D-form
mul{lw, lwo, hw, hwu}[.] : XO-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.
Also cleaned up code for mullw[o][.] as per review comments while
keeping the logic of the tcg ops generated semantically same.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Move floating-point arithmetic instructions to decodetree.

This patch moves the below instructions to decodetree specification :

f{add, sub, mul, div, re, rsqrte, madd, msub, nmadd, nmsub}[s][.] : A-form
ft{div, sqrt} : X-form

With this patch, all the floating-point arithmetic instructions have been
moved to decodetree.
The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Merge various fpu helpers

This patch merges the definitions of the following set of fpu helper methods,
which are similar, using macros :

1. f{add, sub, mul, div}(s)
2. fre(s)
3. frsqrte(s)

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Add ISA v3.1 variants of sync instruction

POWER10 adds a new field to sync for store-store syncs, and some
new variants of the existing syncs that include persistent memory.

Implement the store-store syncs and plwsync/phwsync.

Reviewed-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Fix embedded memory barriers

Memory barriers are supposed to do something on BookE systems, these
were probably just missed during MTTCG enablement, maybe no targets
support SMP. Either way, add proper BookE implementations.

Reviewed-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Move sync instructions to decodetree

This tries to faithfully reproduce the odd BookE logic. Note the
e206 check in gen_msync_4xx() is always false, so not carried over.

It does change the handling of non-zero reserved bits outside the
defined fields from being illegal to being ignored, which the
architecture specifies ot help with backward compatibility of new
fields. The existing behaviour causes illegal instruction exceptions
when using new POWER10 sync variants that add new fields, after this
the instructions are accepted and are implemented as supersets of
the new behaviour, as intended.

Reviewed-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

tcg/cputlb: remove other-cpu capability from TLB flushing

Some TLB flush operations can flush other CPUs. The problem with this
is they used non-synced variants of flushes (i.e., that return
before the destination has completed the flush). Since all TLB flush
users need the _synced variants, and that last user (ppc) of the
non-synced flush was buggy, this is a footgun waiting to go off. There
do not seem to be any callers that flush other CPUs, so remove the
capability.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

tcg/cputlb: Remove non-synced variants of global TLB flushes

These are no longer used.

  tlb_flush_all_cpus: removed by previous commit.
  tlb_flush_page_all_cpus: removed by previous commit.

  tlb_flush_page_bits_by_mmuidx_all_cpus: never used.
  tlb_flush_page_by_mmuidx_all_cpus: never used.
  tlb_flush_page_bits_by_mmuidx_all_cpus: never used, thus:
    tlb_flush_range_by_mmuidx_all_cpus: never used.
    tlb_flush_by_mmuidx_all_cpus: never used.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Fix broadcast tlbie synchronisation

With mttcg, broadcast tlbie instructions do not wait until other vCPUs
have been kicked out of TCG execution before they complete (including
necessary subsequent tlbsync, etc., instructions). This is contrary to
the ISA, and it permits other vCPUs to use translations after the TLB
flush. For example:

   CPU0
   // *memP is initially 0, memV maps to memP with *pte
   *pte = 0;
   ptesync ; tlbie ; eieio ; tlbsync ; ptesync
   *memP = 1;

   CPU1
   assert(*memV == 0);

It is possible for the assertion to fail because CPU1 translates memV
using the TLB after CPU0 has stored 1 to the underlying memory. This
race was observed with a careful test case where CPU1 checks run in a
very large expensive TB so it can run for the entire CPU0 period between
clearing the pte and storing the memory, but host vCPU thread preemption
could cause the race to hit anywhere.

As explained in commit 4ddc104689b ("target/ppc: Fix tlbie"), it is not
enough to just use tlb_flush_all_cpus_synced(), because that does not
execute until the calling CPU has finished its TB. It is also required
that the TB is ended at the point where the TLB flush must subsequently
take effect.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/spapr: Add ibm,pi-features

The ibm,pi-features property has a bit to say whether or not
msgsndp should be used. Linux checks if it is being run under
KVM and avoids msgsndp anyway, but it would be preferable to
rely on this bit.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

spapr: avoid overhead of finding vhyp class in critical operations

PPC_VIRTUAL_HYPERVISOR_GET_CLASS is used in critical operations like
interrupts and TLB misses and is quite costly. Running the
kvm-unit-tests sieve program with radix MMU enabled thrashes the TCG
TLB and spends a lot of time in TLB and page table walking code. The
test takes 67 seconds to complete with a lot of time being spent in
code related to finding the vhyp class:

   12.01%  [.] g_str_hash
    8.94%  [.] g_hash_table_lookup
    8.06%  [.] object_class_dynamic_cast
    6.21%  [.] address_space_ldq
    4.94%  [.] __strcmp_avx2
    4.28%  [.] tlb_set_page_full
    4.08%  [.] address_space_translate_internal
    3.17%  [.] object_class_dynamic_cast_assert
    2.84%  [.] ppc_radix64_xlate

Keep a pointer to the class and avoid this lookup. This reduces the
execution time to 40 seconds.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Merge tag 'pull-tcg-20240523' of https://gitlab.com/rth7680/qemu into staging

tcg: Introduce TCG_TARGET_HAS_tst_vec
accel/tcg: Init tb size and icount before plugin_gen_tb_end

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZPazYdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/hkwgAl/Qdaha8HNW+TkbL
# 3aQU914xSTbQVYKKCihe1R6tJ4jRw9zSj4Bf43f2GCNaz5GZyO2ek3DYHoYF4z/A
# OzNW1Vg2qQ+DS65EhTrvBWOko70zvTeh4eLyASxgEbCpWmsh1d2oLGO0mdjJkrfe
# UdcEXPZ+q0iXAWRFChRClYS5eeVnwYfIeOIzdeUgUezA6fD2zyBT5BgJAxgUTm9w
# jDXJqzcVypDFTSnrBxBVeV2SAVknVM6coc2BoJ/JiVSgupJZuNX7PSbwNI7GTfl/
# LfmiAQyhF78KQiK6TqrliK5mr9R0MSyLORcKQQJrh9G+lxxeO4Sd5qw7V21mVhbc
# YpLJaw==
# =SJem
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 23 May 2024 09:13:42 AM PDT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]

* tag 'pull-tcg-20240523' of https://gitlab.com/rth7680/qemu:
  accel/tcg: Init tb size and icount before plugin_gen_tb_end
  tcg/arm: Support TCG_TARGET_HAS_tst_vec
  tcg/aarch64: Support TCG_TARGET_HAS_tst_vec
  tcg: Expand TCG_COND_TST* if not TCG_TARGET_HAS_tst_vec
  tcg: Introduce TCG_TARGET_HAS_tst_vec

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* hw/i386/pc_sysfw: Alias rather than copy isa-bios region
* target/i386: add control bits support for LAM
* target/i386: tweaks to new translator
* target/i386: add support for LAM in CPUID enumeration
* hw/i386/pc: Support smp.modules for x86 PC machine
* target-i386: hyper-v: Correct kvm_hv_handle_exit return value

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmZOMlAUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNTSwf8DOPgipepNcsxUQoV9nOBfNXqEWa6
# DilQGwuu/3eMSPITUCGKVrtLR5azwCwvNfYYErVBPVIhjImnk3XHwfKpH1csadgq
# 7Np8WGjAyKEIP/yC/K1VwsanFHv3hmC6jfcO3ZnsnlmbHsRINbvU9uMlFuiQkKJG
# lP/dSUcTVhwLT6eFr9DVDUnq4Nh7j3saY85pZUoDclobpeRLaEAYrawha1/0uQpc
# g7MZYsxT3sg9PIHlM+flpRvJNPz/ZDBdj4raN1xo4q0ET0KRLni6oEOVs5GpTY1R
# t4O8a/IYkxeI15K9U7i0HwYI2wVwKZbHgp9XPMYVZFJdKBGT8bnF56pV9A==
# =lp7q
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 May 2024 10:58:40 AM PDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (23 commits)
  target-i386: hyper-v: Correct kvm_hv_handle_exit return value
  i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits 25:14]
  i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[4]
  i386: Add cache topology info in CPUCacheInfo
  hw/i386/pc: Support smp.modules for x86 PC machine
  tests: Add test case of APIC ID for module level parsing
  i386/cpu: Introduce module-id to X86CPU
  i386: Support module_id in X86CPUTopoIDs
  i386: Expose module level in CPUID[0x1F]
  i386: Support modules_per_die in X86CPUTopoInfo
  i386: Introduce module level cpu topology to CPUX86State
  i386/cpu: Decouple CPUID[0x1F] subleaf with specific topology level
  i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB]
  i386/cpu: Introduce bitmap to cache available CPU topology levels
  i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
  i386/cpu: Use APIC ID info get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]
  i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]
  i386/cpu: Fix i/d-cache topology to core level for Intel CPU
  target/i386: add control bits support for LAM
  target/i386: add support for LAM in CPUID enumeration
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-loongarch-20240523' of https://gitlab.com/gaosong/qemu into staging

pull-loongarch-20240523

# -----BEGIN PGP SIGNATURE-----
#
# iLMEAAEKAB0WIQS4/x2g0v3LLaCcbCxAov/yOSY+3wUCZk6fPgAKCRBAov/yOSY+
# 35rwA/98G/tODhR2PAl7qZr6+6z8vazkiT4iNNHgxnw/T2TKsh2YONe+2gtKhTa1
# HKYANMykWTxOtBZeCYY9Z5QNj8DuC3xKc1zY1pC1AwRcflsMlGz0WoAC78Gbl9TC
# PBCwyu01hsFoYpIstH/dOGbNsR2OFRLnnGUVFUKtPuS3O+59hg==
# =OzUv
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 May 2024 06:43:26 PM PDT
# gpg:                using RSA key B8FF1DA0D2FDCB2DA09C6C2C40A2FFF239263EDF
# gpg: Good signature from "Song Gao <m17746591750@163.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B8FF 1DA0 D2FD CB2D A09C  6C2C 40A2 FFF2 3926 3EDF

* tag 'pull-loongarch-20240523' of https://gitlab.com/gaosong/qemu:
  hw/loongarch/virt: Fix FDT memory node address width
  target/loongarch: Add loongarch vector property unconditionally
  hw/loongarch: Remove minimum and default memory size
  hw/loongarch: Refine system dram memory region
  hw/loongarch: Refine fwcfg memory map
  hw/loongarch: Refine fadt memory table for numa memory
  hw/loongarch: Refine acpi srat table for numa memory
  hw/loongarch: Add VM mode in IOCSR feature register in kvm mode
  target/loongarch/kvm: fpu save the vreg registers high 192bit
  target/loongarch/kvm: Fix VM recovery from disk failures

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Init tb size and icount before plugin_gen_tb_end

When passing disassembly data to plugin callbacks,
translator_st_len relies on db->tb->size having been set.

Fixes: 4c833c60e047 ("disas: Use translator_st to get disassembly data")
Reported-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>

tcg/arm: Support TCG_TARGET_HAS_tst_vec

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/aarch64: Support TCG_TARGET_HAS_tst_vec

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Expand TCG_COND_TST* if not TCG_TARGET_HAS_tst_vec

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Introduce TCG_TARGET_HAS_tst_vec

Prelude to supporting TCG_COND_TST* in vector comparisons.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

hw/loongarch/virt: Fix FDT memory node address width

Higher bits for memory nodes were omitted at qemu_fdt_setprop_cells.

Cc: qemu-stable@nongnu.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240520-loongarch-fdt-memnode-v1-1-5ea9be93911e@flygoat.com>
Signed-off-by: Song Gao <gaosong@loongson.cn>

target/loongarch: Add loongarch vector property unconditionally

Currently LSX/LASX vector property is decided by the default value.
Instead vector property should be added unconditionally, and it is
irrelative with its default value. If vector is disabled by default,
vector also can be enabled from command line.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240521080549.434197-2-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

hw/loongarch: Remove minimum and default memory size

Some qtest test cases such as numa use default memory size of generic
machine class, which is 128M by fault.

Here generic default memory size is used, and also remove minimum memory
size which is 1G originally.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240515093927.3453674-6-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

hw/loongarch: Refine system dram memory region

For system dram memory region, it is not necessary to use numa node
information. There is only low memory region and high memory region.

Remove numa node information for ddr memory region here, it can reduce
memory region number on LoongArch virt machine.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240515093927.3453674-5-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

hw/loongarch: Refine fwcfg memory map

Memory map table for fwcfg is used for UEFI BIOS, UEFI BIOS uses the first
entry from fwcfg memory map as the first memory HOB, the second memory HOB
will be used if the first memory HOB is used up.

Memory map table for fwcfg does not care about numa node, however in
generic the first memory HOB is part of numa node0, so that runtime
memory of UEFI which is allocated from the first memory HOB is located
at numa node0.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240515093927.3453674-4-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

hw/loongarch: Refine fadt memory table for numa memory

One LoongArch virt machine platform, there is limitation for memory
map information. The minimum memory size is 256M and minimum memory
size for numa node0 is 256M also. With qemu numa qtest, it is possible
that memory size of numa node0 is 128M.

Limitations for minimum memory size for both total memory and numa
node0 is removed for fadt numa memory table creation.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240515093927.3453674-3-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

hw/loongarch: Refine acpi srat table for numa memory

One LoongArch virt machine platform, there is limitation for memory
map information. The minimum memory size is 256M and minimum memory
size for numa node0 is 256M also. With qemu numa qtest, it is possible
that memory size of numa node0 is 128M.

Limitations for minimum memory size for both total memory and numa
node0 is removed for acpi srat table creation.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240515093927.3453674-2-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

hw/loongarch: Add VM mode in IOCSR feature register in kvm mode

If VM runs in kvm mode, VM mode is added in IOCSR feature register.
So guest can detect kvm hypervisor type and enable possible pv functions.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240514025109.3238398-1-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>

target/loongarch/kvm: fpu save the vreg registers high 192bit

On kvm side, get_fpu/set_fpu save the vreg registers high 192bits,
but QEMU missing.

Cc: qemu-stable@nongnu.org
Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-Id: <20240514110752.989572-1-gaosong@loongson.cn>

target/loongarch/kvm: Fix VM recovery from disk failures

vmstate does not save kvm_state_conter,
which can cause VM recovery from disk to fail.

Cc: qemu-stable@nongnu.org
Signed-off-by: Song Gao <gaosong@loongson.cn>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20240508024732.3127792-1-gaosong@loongson.cn>

Merge tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu into staging

Migration pull request

- Li Zhijian's COLO minor fixes
- Marc-André's virtio-gpu fix
- Fiona's virtio-net USO fix
- A couple of migration-test fixes from Thomas

# -----BEGIN PGP SIGNATURE-----
#
# iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmZObggQHGZhcm9zYXNA
# c3VzZS5kZQAKCRDHmNx0G+wxnWE8D/49RGE+g29qyk9aKx3lU8mSq+ZzmX5GncBt
# 5+Mx5qoHDsBCQTE+dQpEVIoeMJ2HIbgbOML4qsnp6Hw/4/TWkfwC/R6+ZmHBevRk
# fVLkVh2JMHVg8Tq+0FO1X1QnMU03uJ7EAuWdDa8HqlJ5dQY/K3gDaku8oQBXk96X
# 13pChSbMob76tdb+wiwbdEakabigH7XfrPdI6lzI8MCGTIcPKc/UKTFYuoj/OsNx
# raqy+uBtvKtfHxiaYnIgHIPNAF/1f4tP3iAOcPoZWIMXWxFkE8+ANDJAbWo6xIcL
# DGg/wEzZO/OnXLjOhjvLBUHK/fx4wQ5bsqA09BVxoRyBGblkXr+bcwBLYjgiEqzT
# aniPiAx5W/Db+T7HqZPIWesFYj3cmcwvYUTrx/RPMdC0epG+ZczDMtescHdZbxvt
# Pjs3nFeCLhyYcVhlTI72eXRCxdd/26+r6/OmrBC2+GaZrybM61TvNo+3XvO0Pfhi
# UmwF2EN27XmSMelLvH/MnflUVgBHKDs3CCQzDlxreHq2jMVR0SL7LU5wMJJ58Iok
# M3u74izQM25bwYxiASH+4iRn0puH1mOwgOx28W0uiQfZY/678/lCnwa1Tul15BRE
# fIQZJhyIGzhSpwLqEXmdXdlLQs1isqIgpd/mzKgZ285nLr7kz+4gxCUqiXgVbrl7
# P45Dym1u4g==
# =DDrh
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 May 2024 03:13:28 PM PDT
# gpg:                using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D
# gpg:                issuer "farosas@suse.de"
# gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown]
# gpg:                 aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3  64CF C798 DC74 1BEC 319D

* tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu:
  tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py
  tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too
  hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1
  virtio-gpu: fix v2 migration
  migration: fix a typo
  migration: add "exists" info to load-state-field trace
  migration/colo: Tidy up bql_unlock() around bdrv_activate_all()
  migration/colo: make colo_incoming_co() return void
  migration/colo: Minor fix for colo error message

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py

If analyze-migration.py cannot be run or crashes, the error is currently
ignored since the code only checks for nonzero values in case the child
exited properly. For example, if you run the test with a non-existing
Python interpreter, it still succeeds:

$ PYTHON=wrongpython QTEST_QEMU_BINARY=./qemu-system-x86_64 tests/qtest/migration-test
...
# Running /x86_64/migration/analyze-script
# Using machine type: pc-q35-9.1
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-417639.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-417639.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pc-q35-9.1, -name source,debug-threads=on -m 150M -serial file:/tmp/migration-test-XPLUN2/src_serial -drive if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=1 -uuid 11111111-1111-1111-1111-111111111111 -accel qtest
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-417639.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-417639.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine pc-q35-9.1, -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-XPLUN2/dest_serial -incoming tcp:127.0.0.1:0 -drive if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=1 -accel qtest
**
ERROR:../../devel/qemu/tests/qtest/migration-test.c:1603:test_analyze_script: code should not be reached
migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: qtest_wait_qemu: Assertion `pid == s->qemu_pid' failed.
migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: qtest_wait_qemu: Assertion `pid == s->qemu_pid' failed.
ok 2 /x86_64/migration/analyze-script
...

Let's better fail the test in case the child did not exit properly, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too

On s390x, we recently had a regression that broke migration / savevm
(see commit bebe9603fc ("hw/intc/s390_flic: Fix crash that occurs when
saving the machine state"). The problem was merged without being noticed
since we currently do not run any migration / savevm related tests on
x86 hosts.
While we currently cannot run all migration tests for the s390x target
on x86 hosts yet (due to some unresolved issues with TCG), we can at
least run some of the non-live tests to avoid such problems in the future.
Thus enable the "analyze-script" and the "bad_dest" tests before checking
for KVM on s390x or ppc64 (this also fixes the problem that the
"analyze-script" test was not run on s390x at all anymore since it got
disabled again by accident in a previous refactoring of the code).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
version 8.1 can fail with:

> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
> kvm: Failed to load virtio-net:virtio
> kvm: error while loading state for instance 0x0 of device '0000:00:12.0/virtio-net'
> kvm: load of migration failed: Operation not permitted

The series

53da8b5a99 virtio-net: Add support for USO features
9da1684954 virtio-net: Add USO flags to vhost support.
f03e0cf63b tap: Add check for USO features
2ab0ec3121 tap: Add USO support to tap device.

only landed in QEMU 8.2, so the compatibility flags should be part of
machine version 8.1.

Moving the flags unfortunately breaks forward migration with machine
version 8.1 from a binary without this patch to a binary with this
patch.

Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

virtio-gpu: fix v2 migration

Commit dfcf74fa ("virtio-gpu: fix scanout migration post-load") broke
forward/backward version migration. Versioning of nested VMSD structures
is not straightforward, as the wire format doesn't have nested
structures versions. Introduce x-scanout-vmstate-version and a field
test to save/load appropriately according to the machine version.

Fixes: dfcf74fa ("virtio-gpu: fix scanout migration post-load")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
[fixed long lines]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: fix a typo

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: add "exists" info to load-state-field trace

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration/colo: Tidy up bql_unlock() around bdrv_activate_all()

Make the code more tight.

Suggested-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration/colo: make colo_incoming_co() return void

Currently, it always returns 0, no need to check the return value at all.
In addition, enter colo coroutine only if migration_incoming_colo_enabled()
is true.
Once the destination side enters the COLO* state, the COLO process will
take over the remaining processes until COLO exits.

Cc: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration/colo: Minor fix for colo error message

- Explicitly show the missing module name: replication
- Fix capability name to x-colo

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Suggested-by: Michael Tokarev <mjt@tls.msk.ru>
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas <farosas@suse.de>

target-i386: hyper-v: Correct kvm_hv_handle_exit return value

This bug fix addresses the incorrect return value of kvm_hv_handle_exit for
KVM_EXIT_HYPERV_SYNIC, which should be EXCP_INTERRUPT.

Handling of KVM_EXIT_HYPERV_SYNIC in QEMU needs to be synchronous.
This means that async_synic_update should run in the current QEMU vCPU
thread before returning to KVM, returning EXCP_INTERRUPT to guarantee this.
Returning 0 can cause async_synic_update to run asynchronously.

One problem (kvm-unit-tests's hyperv_synic test fails with timeout error)
caused by this bug:

When a guest VM writes to the HV_X64_MSR_SCONTROL MSR to enable Hyper-V SynIC,
a VM exit is triggered and processed by the kvm_hv_handle_exit function of the
QEMU vCPU. This function then calls the async_synic_update function to set
synic->sctl_enabled to true. A true value of synic->sctl_enabled is required
before creating SINT routes using the hyperv_sint_route_new() function.

If kvm_hv_handle_exit returns 0 for KVM_EXIT_HYPERV_SYNIC, the current QEMU
vCPU thread may return to KVM and enter the guest VM before running
async_synic_update. In such case, the hyperv_synic test’s subsequent call to
synic_ctl(HV_TEST_DEV_SINT_ROUTE_CREATE, ...) immediately after writing to
HV_X64_MSR_SCONTROL can cause QEMU’s hyperv_sint_route_new() function to return
prematurely (because synic->sctl_enabled is false).

If the SINT route is not created successfully, the SINT interrupt will not be
fired, resulting in a timeout error in the hyperv_synic test.

Fixes: 267e071bd6d6 (“hyperv: make overlay pages for SynIC”)
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com>
Message-ID: <20240521200114.11588-1-dongsheng.x.zhang@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits 25:14]

CPUID[0x8000001D].EAX[bits 25:14] NumSharingCache: number of logical
processors sharing cache.

The number of logical processors sharing this cache is
NumSharingCache + 1.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[0x8000001D].EAX[bits 25:14].

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-22-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[4]

CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
Intel CPUs.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[4].EAX[bits 25:14].

And since with the helper max_processor_ids_for_cache(), the filed
CPUID[4].EAX[bits 25:14] (original virable "num_apic_ids") is parsed
based on cpu topology levels, which are verified when parsing -smp, it's
no need to check this value by "assert(num_apic_ids > 0)" again, so
remove this assert().

Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
helper to make the code cleaner.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-21-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Add cache topology info in CPUCacheInfo

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache.

Thus, we should explicitly define the corresponding cache topology for
different cache models to increase scalability.

Except legacy_l2_cache_cpuid2 (its default topo level is
CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level
for all other cache models. In order to be compatible with the existing
cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set
the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the
CPU_TOPO_LEVEL_DIE level for L3 cache.

The field for CPUID[4].EAX[bits 25:14] or CPUID[0x8000001D].EAX[bits
25:14] will be set based on CPUCacheInfo.share_level.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20240424154929.1487382-20-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/i386/pc: Support smp.modules for x86 PC machine

As module-level topology support is added to X86CPU, now we can enable
the support for the modules parameter on PC machines. With this support,
we can define a 5-level x86 CPU topology with "-smp":

-smp cpus=*,maxcpus=*,sockets=*,dies=*,modules=*,cores=*,threads=*.

So, add the 5-level topology example in description of "-smp".

Additionally, add the missed drawers and books options in previous
example.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-19-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tests: Add test case of APIC ID for module level parsing

After i386 supports module level, it's time to add the test for module
level's parsing.

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20240424154929.1487382-18-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Introduce module-id to X86CPU

Introduce module-id to be consistent with the module-id field in
CpuInstanceProperties.

Following the legacy smp check rules, also add the module_id validity
into x86_cpu_pre_plug().

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-17-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Support module_id in X86CPUTopoIDs

Add module_id member in X86CPUTopoIDs.

module_id can be parsed from APIC ID, so also update APIC ID parsing
rule to support module level. With this support, the conversions with
module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
completed.

module_id can be also generated from cpu topology, and before i386
supports "modules" in smp, the default "modules per die" (modules *
clusters) is only 1, thus the module_id generated in this way is 0,
so that it will not conflict with the module_id generated by APIC ID.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-16-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Expose module level in CPUID[0x1F]

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] if the machine has more than 1
modules.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-15-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Support modules_per_die in X86CPUTopoInfo

Support module level in i386 cpu topology structure "X86CPUTopoInfo".

Since x86 does not yet support the "modules" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1.

Therefore, the module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0 for
now, so we can directly add APIC ID related helpers to support module
level parsing.

In addition, update topology structure in test-x86-topo.c.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-14-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Introduce module level cpu topology to CPUX86State

Intel CPUs implement module level on hybrid client products (e.g.,
ADL-N, MTL, etc) and E-core server products.

A module contains a set of cores that share certain resources (in
current products, the resource usually includes L2 cache, as well as
module scoped features and MSRs).

Module level support is the prerequisite for L2 cache topology on
module level. With module level, we can implement the Guest's CPU
topology and future cache topology to be consistent with the Host's on
Intel hybrid client/E-core server platforms.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-13-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Decouple CPUID[0x1F] subleaf with specific topology level

At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.

In fact, the specific topology level exposed in 0x1F depends on the
platform's support for extension levels (module, tile and die).

To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
with specific topology level.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240424154929.1487382-12-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB]

CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared
by Intel and AMD CPUs.

But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU
(in CPUID[0x80000026]) have the different definitions with different
enumeration values.

Though CPUID[0x80000026] hasn't been implemented in QEMU, to avoid
possible misunderstanding, split topology types of CPUID[0x1F] from the
definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology
types.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-11-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Introduce bitmap to cache available CPU topology levels

Currently, QEMU checks the specify number of topology domains to detect
if there's extended topology levels (e.g., checking nr_dies).

With this bitmap, the extended CPU topology (the levels other than SMT,
core and package) could be easier to detect without touching the
topology details.

This is also in preparation for the follow-up to decouple CPUID[0x1F]
subleaf with specific topology level.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240424154929.1487382-10-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()

In cpu_x86_cpuid(), there are many variables in representing the cpu
topology, e.g., topo_info, cs->nr_cores and cs->nr_threads.

Since the names of cs->nr_cores and cs->nr_threads do not accurately
represent its meaning, the use of cs->nr_cores or cs->nr_threads is
prone to confusion and mistakes.

And the structure X86CPUTopoInfo names its members clearly, thus the
variable "topo_info" should be preferred.

In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
replace env->dies with topo_info.dies_per_pkg as well.

Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-9-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Use APIC ID info get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]

The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
the number of sharing threads directly.

From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
means [1]:

The number of logical processors sharing this cache is the value of
this field incremented by 1. To determine which logical processors are
sharing a cache, determine a Share Id for each processor as follows:

ShareId = LocalApicId >> log2(NumSharingCache+1)

Logical processors with the same ShareId then share a cache. If
NumSharingCache+1 is not a power of two, round it up to the next power
of two.

From the description above, the calculation of this field should be same
as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
APIC ID to calculate this field.

[1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
Information

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240424154929.1487382-8-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]

Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.

The nearest power-of-2 integer can be calculated by pow2ceil() or by
using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).

But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's calculated by:
"(1 << (pkg_offset - core_offset)) - 1".

Therefore the topology information of APIC ID should be preferred to
calculate nearest power-of-2 integer for CPUID.04H:EAX[bits 25:14] and
CPUID.04H:EAX[bits 31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
   instead of "cs->nr_threads" in encode_cache_cpuid4() for
   CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
   1 << core_offset should also be used instead of "cs->nr_threads" in
   encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
   calculated with the bit width between the package and SMT levels in
   the APIC ID (1 << (pkg_offset - core_offset) - 1).

In addition, use APIC ID bits calculations to replace "pow2ceil()" for
cache_info_passthrough case.

[1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
[2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
[3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-7-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i386/cpu: Fix i/d-cache topology to core level for Intel CPU

For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs
sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits
25:14]) to 0, and this means i-cache and d-cache are shared in the SMT
level.

This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
Information for cpuid 0x8000001D") has already introduced i/d cache
topology as core level by default.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20240424154929.1487382-6-zhao1.liu@intel.com>
[Add compat property. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: add control bits support for LAM

LAM uses CR3[61] and CR3[62] to configure/enable LAM on user pointers.
LAM uses CR4[28] to configure/enable LAM on supervisor pointers.

For CR3 LAM bits, no additional handling needed:
- TCG
  LAM is not supported for TCG of target-i386.  helper_write_crN() and
  helper_vmrun() check max physical address bits before calling
  cpu_x86_update_cr3(), no change needed, i.e. CR3 LAM bits are not allowed
  to be set in TCG.
- gdbstub
  x86_cpu_gdb_write_register() will call cpu_x86_update_cr3() to update cr3.
  Allow gdb to set the LAM bit(s) to CR3, if vcpu doesn't support LAM,
  KVM_SET_SREGS will fail as other reserved bits.

For CR4 LAM bit, its reservation depends on vcpu supporting LAM feature or
not.
- TCG
  LAM is not supported for TCG of target-i386.  helper_write_crN() and
  helper_vmrun() check CR4 reserved bit before calling cpu_x86_update_cr4(),
  i.e. CR4 LAM bit is not allowed to be set in TCG.
- gdbstub
  x86_cpu_gdb_write_register() will call cpu_x86_update_cr4() to update cr4.
  Mask out LAM bit on CR4 if vcpu doesn't support LAM.
- x86_cpu_reset_hold() doesn't need special handling.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240112060042.19925-3-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: add support for LAM in CPUID enumeration

Linear Address Masking (LAM) is a new Intel CPU feature, which allows
software to use of the untranslated address bits for metadata.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[26]

Add CPUID definition for LAM.

Note LAM feature is not supported for TCG of target-i386, LAM CPIUD bit
will not be added to TCG_7_1_EAX_FEATURES.

More info can be found in Intel ISE Chapter "LINEAR ADDRESS MASKING(LAM)"
https://cdrdv2.intel.com/v1/dl/getContent/671368

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240112060042.19925-2-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/i386/pc_sysfw: Alias rather than copy isa-bios region

In the -bios case the "isa-bios" memory region is an alias to the BIOS mapped
to the top of the 4G memory boundary. Do the same in the -pflash case, but only
for new machine versions for migration compatibility. This establishes common
behavior and makes pflash commands work in the "isa-bios" region which some
real-world legacy bioses rely on.

Note that in the sev_enabled() case, the "isa-bios" memory region in the -pflash
case will now also point to encrypted memory, just like it already does in the
-bios case.

When running `info mtree` before and after this commit with
`qemu-system-x86_64 -S -drive \
if=pflash,format=raw,readonly=on,file=/usr/share/qemu/bios-256k.bin` and running
`diff -u before.mtree after.mtree` results in the following changes in the
memory tree:

   --- before.mtree
   +++ after.mtree
   @@ -71,7 +71,7 @@
        0000000000000000-ffffffffffffffff (prio -1, i/o): pci
        00000000000a0000-00000000000bffff (prio 1, i/o): vga-lowmem
        00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
   -      00000000000e0000-00000000000fffff (prio 1, rom): isa-bios
   +      00000000000e0000-00000000000fffff (prio 1, romd): alias isa-bios @system.flash0 0000000000020000-000000000003ffff
        00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
        00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff
        00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff
   @@ -108,7 +108,7 @@
        0000000000000000-ffffffffffffffff (prio -1, i/o): pci
        00000000000a0000-00000000000bffff (prio 1, i/o): vga-lowmem
        00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
   -      00000000000e0000-00000000000fffff (prio 1, rom): isa-bios
   +      00000000000e0000-00000000000fffff (prio 1, romd): alias isa-bios @system.flash0 0000000000020000-000000000003ffff
        00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
        00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff
        00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff
   @@ -131,11 +131,14 @@
   memory-region: pc.ram
   0000000000000000-0000000007ffffff (prio 0, ram): pc.ram

   +memory-region: system.flash0
   +  00000000fffc0000-00000000ffffffff (prio 0, romd): system.flash0
   +
   memory-region: pci
   0000000000000000-ffffffffffffffff (prio -1, i/o): pci
        00000000000a0000-00000000000bffff (prio 1, i/o): vga-lowmem
        00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
   -    00000000000e0000-00000000000fffff (prio 1, rom): isa-bios
   +    00000000000e0000-00000000000fffff (prio 1, romd): alias isa-bios @system.flash0 0000000000020000-000000000003ffff

   memory-region: smram
        00000000000a0000-00000000000bffff (prio 0, ram): alias smram-low @pc.ram 00000000000a0000-00000000000bffff

Note that in both cases the "system" memory region contains the entry

  00000000fffc0000-00000000ffffffff (prio 0, romd): system.flash0

but the "system.flash0" memory region only appears standalone when "isa-bios" is
an alias.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-ID: <20240508175507.22270-7-shentey@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: clean up AAM/AAD

The 32-bit AAM/AAD opcodes are using helpers that read and write flags and
env->regs[R_EAX]. Clean them up so that the table correctly includes AX
as a 16-bit input and output.

No real reason to do it to be honest, but they are nice one-output helpers
and it removes the masking of env->regs[R_EAX] that generic load/writeback
code already does.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240522123912.608497-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: generate simpler code for ROL/ROR with immediate count

gen_rot_carry and gen_rot_overflow are meant to be called with count == NULL
if the count cannot be zero. However this is not done in gen_ROL and gen_ROR,
and writing everywhere "can_be_zero ? count : NULL" is burdensome and less
readable. Just pass can_be_zero as a separate argument.

gen_RCL and gen_RCR use a conditional branch to skip the computation
if count is zero, so they can pass false unconditionally to gen_rot_overflow.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240522123914.608516-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'pull-vfio-20240522' of https://github.com/legoater/qemu into staging

vfio queue:

* Improvement of error reporting during migration
* Removed Vendor Specific Capability check on newer machine
* Addition of a VFIO migration QAPI event
* Changed prototype of routines using an error parameter to return bool
* Several cleanups regarding autofree variables

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmZNwDEACgkQUaNDx8/7
# 7KHaYQ/+MUFOiWEiAwJdP8I1DkY6mJV3ZDixKMHLmr8xH6fAkR2htEw6UUcYijcn
# Z0wVvcB7A1wetgIAB2EPc2o6JtRD1uEW2pPq3SVpdWO2rWYa4QLvldOiJ8A+Kvss
# 0ZugWirgZsM7+ka9TCuysmqWdQD+P6z2RURMSwiPi6QPHwv1Tt69gLSxFeV5WWai
# +mS6wUbaU3LSt6yRhORRvFkCss4je3D3YR73ivholGHANxi/7C5T22KwOHrW6Qzf
# uk3W/zq1yL1YLXSu6WoKPw0mMCvNtGyKK2oAlhG3Ln1tPYnctNrlfXlApqxEOGl3
# adGtwd6fyg6UTRR+vOXEy1QPCGcHtKWc5SuV5E677JftARJMwzbXrJw9Y9xS2RCQ
# oRYS5814k9RdubTxu+/l8NLICMdox7dNy//QLyrIdD7nJKYhFODkV1giWh4NWkt6
# m0T3PGLlUJ/V2ngWQu9Aw150m3lCPEKt+Nv/mGOEFDRu9dv55Vb7oJwr1dBB/n+e
# 1lNNpDmV0YipoKYMzrlBwNwxhXGJOtNPwHtw/vZuiy70CXUwo0t4XLMpWbWasxZc
# 0yz4O9RLRJEhPtPqv54aLsE2kNY10I8vwHBlhyNgIEsA7eCDduA+65aPBaqIF7z6
# GjvYdixF+vAZFexn0mDi1gtM3Yh60Hiiq1j7kKyyti/q0WUQzIc=
# =awMc
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 May 2024 02:51:45 AM PDT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20240522' of https://github.com/legoater/qemu: (47 commits)
  vfio/igd: Use g_autofree in vfio_probe_igd_bar4_quirk()
  vfio: Use g_autofree in all call site of vfio_get_region_info()
  vfio/pci-quirks: Make vfio_add_*_cap() return bool
  vfio/pci-quirks: Make vfio_pci_igd_opregion_init() return bool
  vfio/pci: Use g_autofree for vfio_region_info pointer
  vfio/pci: Make capability related functions return bool
  vfio/pci: Make vfio_populate_vga() return bool
  vfio/pci: Make vfio_intx_enable() return bool
  vfio/pci: Make vfio_populate_device() return a bool
  vfio/pci: Make vfio_pci_relocate_msix() and vfio_msix_early_setup() return a bool
  vfio/pci: Make vfio_intx_enable_kvm() return a bool
  vfio/ccw: Make vfio_ccw_get_region() return a bool
  vfio/platform: Make vfio_populate_device() and vfio_base_device_init() return bool
  vfio/helpers: Make vfio_device_get_name() return bool
  vfio/helpers: Make vfio_set_irq_signaling() return bool
  vfio/helpers: Use g_autofree in vfio_set_irq_signaling()
  vfio/display: Make vfio_display_*() return bool
  vfio/display: Fix error path in call site of ramfb_setup()
  backends/iommufd: Make iommufd_backend_*() return bool
  vfio/cpr: Make vfio_cpr_register_container() return bool
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

vfio/igd: Use g_autofree in vfio_probe_igd_bar4_quirk()

Pointer opregion, host and lpc are allocated and freed in
vfio_probe_igd_bar4_quirk(). Use g_autofree to automatically
free them.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio: Use g_autofree in all call site of vfio_get_region_info()

There are some exceptions when pointer to vfio_region_info is reused.
In that case, the pointed memory is freed manually.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci-quirks: Make vfio_add_*_cap() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Include below functions:
vfio_add_virt_caps()
vfio_add_nv_gpudirect_cap()
vfio_add_vmd_shadow_cap()

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci-quirks: Make vfio_pci_igd_opregion_init() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Use g_autofree for vfio_region_info pointer

Pointer opregion is freed after vfio_pci_igd_opregion_init().
Use 'g_autofree' to avoid the g_free() calls.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Make capability related functions return bool

The functions operating on capability don't have a consistent return style.

Below functions are in bool-valued functions style:
vfio_msi_setup()
vfio_msix_setup()
vfio_add_std_cap()
vfio_add_capabilities()

Below two are integer-valued functions:
vfio_add_vendor_specific_cap()
vfio_setup_pcie_cap()

But the returned integer is only used for check succeed/failure.
Change them all to return bool so now all capability related
functions follow the coding standand in qapi/error.h to return
bool.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Make vfio_populate_vga() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Make vfio_intx_enable() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Make vfio_populate_device() return a bool

Since vfio_populate_device() takes an 'Error **' argument,
best practices suggest to return a bool. See the qapi/error.h
Rules section.

By this chance, pass errp directly to vfio_populate_device() to
avoid calling error_propagate().

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Make vfio_pci_relocate_msix() and vfio_msix_early_setup() return a bool

Since vfio_pci_relocate_msix() and vfio_msix_early_setup() takes
an 'Error **' argument, best practices suggest to return a bool.
See the qapi/error.h Rules section.

By this chance, pass errp directly to vfio_msix_early_setup() to avoid
calling error_propagate().

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Make vfio_intx_enable_kvm() return a bool

Since vfio_intx_enable_kvm() takes an 'Error **' argument,
best practices suggest to return a bool. See the qapi/error.h
Rules section.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/ccw: Make vfio_ccw_get_region() return a bool

Since vfio_populate_device() takes an 'Error **' argument,
best practices suggest to return a bool. See the qapi/error.h
Rules section.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/platform: Make vfio_populate_device() and vfio_base_device_init() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/helpers: Make vfio_device_get_name() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/helpers: Make vfio_set_irq_signaling() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/helpers: Use g_autofree in vfio_set_irq_signaling()

Local pointer irq_set is freed before return from
vfio_set_irq_signaling().

Use 'g_autofree' to avoid the g_free() calls.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/display: Make vfio_display_*() return bool

This is to follow the coding standand in qapi/error.h to return bool
for bool-valued functions.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/display: Fix error path in call site of ramfb_setup()

vfio_display_dmabuf_init() and vfio_display_region_init() calls
ramfb_setup() without checking its return value.

So we may run into a situation that vfio_display_probe() succeed
but errp is set. This is risky and may lead to assert failure in
error_setv().

Cc: Gerd Hoffmann <kraxel@redhat.com>
Fixes: b290659fc3d ("hw/vfio/display: add ramfb support")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

Merge tag 'hw-misc-20240517' of https://github.com/philmd/qemu into staging

Misc HW patches queue

- Fix build when GBM buffer management library is detected (Cédric)
- Fix PFlash block write (Gerd)
- Allow 'parameter=1' for SMP topology on any machine (Daniel)
- Allow guest-debug tests to run with recent GDB (Gustavo)

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmZHcOsACgkQ4+MsLN6t
# wN4CqxAA15Ow9ubxipORpM+XJgJ5isyPjD1s/6bR6lj7joBS6CYQbMaaskXuDQK8
# FpeoWw2DI2Fh/61NcUMAk7XBFF59DLrtngDhfLZJYdwBh0S8RFs1wp6sKyaBA9K6
# wDy39plxt/abKGzj3EcJUGDvhBLPJNnqy5OF9fZtWGrQg+A1i9uLMu/ac6srfX+K
# zau/CxQaHYRYLYFmRcQCOhFVAtp2TQHw14CiiLYMCxF3GvUCN0xmtg8lzj9/y4ke
# Yt0VN6jC3opfmQuDtPJNNkp8beaHbwMARFmXepDVB2cHp8DY5Gm4Ij2WiR0K985G
# fqDknHEXDPI+RislV9+EN3p2c05m7ihPKLiDLYCulD4TIRDz+eUf71Onus9uecj9
# zCDdPYjU1ly9pyt7EVG2Bla9D/F51ZvbrzJQrHbvqhxWuZGOPSzHdpSsHZBIOXk6
# OhxTtUPeWDYW5K+wdNpxYPy5dqIR3jSEbDwLh2Wts2iPKxCGC8ly6CbZJPgA5lQE
# hwYbiSKNcxAMV3V9qBfKLRSGadnnfPwG/zrGOHBni9ejz+m7foA13mJ4H6VFBn7Q
# GGe9f00MCKcWTTlzRty1oIzAKcpupCOanX0MpVNcTYUqVtODhlQpDdH63ZVuiyRU
# kux9xz71I+mwkjQiTHTki1qcAbLNj9+jgwbcc74Zz1BngIauqtc=
# =Octv
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 May 2024 04:59:55 PM CEST
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]

* tag 'hw-misc-20240517' of https://github.com/philmd/qemu:
  tests: Gently exit from GDB when tests complete
  tests: add testing of parameter=1 for SMP topology
  hw/core: allow parameter=1 for SMP topology on any machine
  hw/pflash: fix block write start
  ui/console: Only declare variable fence_fd when CONFIG_GBM is defined

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-request-2024-05-17' of https://gitlab.com/thuth/qemu into staging

* Fix s390x crash when doing migration / savevm
* Decrease size of CI containers by removing unnecessary packages

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmZHTHgRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbXZrxAAsh6zHycGpaQdfcoy3bDmd8gTbuLiME/h
# JyJxZ/+GQc+8v8WPwB+HuF7IijtopYCfyO6Vu2y/5wj8i1gHbNulxlS5SjusJp6i
# Xxlvuw74xo8Z2oJ6D8Ayk2KHcld5M0m9T77CgP8WcGKmBQU42XWm89fKvviPtn+K
# DtLNEpvTlcdEj0uhxhHldHKQnqNryxSHM1MSsmVIKibkQHgG7GBYnw922lZ2x27A
# AqSzgzNXAbhmSn75oQfkGUk+vUmlXukfBAHi48BLnAs28sSUue3Su+zw9r8sxhKw
# jdvzIB1kyF01AYiKWmhB3voXNduswT9I/cNiQorgOBEJ4lKEzrhsTI92GpvNG3gR
# J0CRBUmnGC2k/4GRa+GhFEpFn9FyWeOjPj2oGv03LO4AgTWzi1zNcO++OIWsk0Ge
# rO2n2PEEz8RaI/49CTLGi3Eu0Rh0yZnrgZRjcji5ZZ3omQ/OrwXGyr3FMDNFNuXs
# vWr9p4K1vz2P/L+RC+TCM0U46gykQuBPseRsdVvbJxAoNP4HwmdE9jDy1Wl1mG1u
# Iac63/+srr/871UFzp7ft8ukKTVKy4elQJ78tDCsmRhkVNjLFWwf4SNY6RaneeYM
# IbsLcjWpZPl4I9KR6Of5p+aAHAUg6xKIIaIR01fMyQL44ELomfbpH2rKp4tObJHj
# WIEKnOWuclo=
# =vgc1
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 May 2024 02:24:24 PM CEST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]

* tag 'pull-request-2024-05-17' of https://gitlab.com/thuth/qemu:
  hw/intc/s390_flic: Fix crash that occurs when saving the machine state
  tests/docker/dockerfiles: Update container files with "lcitool-refresh"
  tests/lcitool/projects/qemu.yml: Sort entries alphabetically again
  tests/lcitool: Remove g++ from the containers (except for the MinGW one)
  tests/lcitool: Remove 'xfsprogs' from QEMU
  tests/lcitool/refresh: Treat the output of lcitool as text, not as bytes

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tests: Gently exit from GDB when tests complete

GDB commit a207f6b3a38 ('Rewrite "python" command exception handling')
changed how exit() called from Python scripts loaded by GDB behave,
turning it into an exception instead of a generic error code that is
returned. This change caused several QEMU tests to crash with the
following exception:

Python Exception <class 'SystemExit'>: 0
Error occurred in Python: 0

This happens because in tests/guest-debug/test_gdbstub.py exit is
called after the tests have completed.

This commit fixes it by politely asking GDB to exit via gdb.execute,
passing the proper fail_count to be reported to 'make', instead of
abruptly calling exit() from the Python script.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240515173132.2462201-4-gustavo.romero@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

tests: add testing of parameter=1 for SMP topology

Validate that it is possible to pass 'parameter=1' for any SMP topology
parameter, since unsupported parameters are implicitly considered to
always have a value of 1.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-ID: <20240513123358.612355-3-berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/core: allow parameter=1 for SMP topology on any machine

This effectively reverts

  commit 54c4ea8f3ae614054079395842128a856a73dbf9
  Author: Zhao Liu <zhao1.liu@intel.com>
  Date:   Sat Mar 9 00:01:37 2024 +0800

    hw/core/machine-smp: Deprecate unsupported "parameter=1" SMP configurations

but is not done as a 'git revert' since the part of the changes to the
file hw/core/machine-smp.c which add 'has_XXX' checks remain desirable.
Furthermore, we have to tweak the subsequently added unit test to
account for differing warning message.

The rationale for the original deprecation was:

  "Currently, it was allowed for users to specify the unsupported
   topology parameter as "1". For example, x86 PC machine doesn't
   support drawer/book/cluster topology levels, but user could specify
   "-smp drawers=1,books=1,clusters=1".

   This is meaningless and confusing, so that the support for this kind
   of configurations is marked deprecated since 9.0."

There are varying POVs on the topic of 'unsupported' topology levels.

It is common to say that on a system without hyperthreading, that there
is always 1 thread. Likewise when new CPUs introduced a concept of
multiple "dies', it was reasonable to say that all historical CPUs
before that implicitly had 1 'die'. Likewise for the more recently
introduced 'modules' and 'clusters' parameter'. From this POV, it is
valid to set 'parameter=1' on the -smp command line for any machine,
only a value > 1 is strictly an error condition.

It doesn't cause any functional difficulty for QEMU, because internally
the QEMU code is itself assuming that all "unsupported" parameters
implicitly have a value of '1'.

At the libvirt level, we've allowed applications to set 'parameter=1'
when configuring a guest, and pass that through to QEMU.

Deprecating this creates extra difficulty for because there's no info
exposed from QEMU about which machine types "support" which parameters.
Thus, libvirt can't know whether it is valid to pass 'parameter=1' for
a given machine type, or whether it will trigger deprecation messages.

Since there's no apparent functional benefit to deleting this deprecated
behaviour from QEMU, and it creates problems for consumers of QEMU,
remove this deprecation.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-ID: <20240513123358.612355-2-berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/pflash: fix block write start

Move the pflash_blk_write_start() call. We need the offset of the
first data write, not the offset for the setup (number-of-bytes)
write. Without this fix u-boot can do block writes to the first
flash block only.

While being at it drop a leftover FIXME.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2343
Fixes: 284a7ee2e290 ("hw/pflash: implement update buffer for block writes")
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240516121237.534875-1-kraxel@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

ui/console: Only declare variable fence_fd when CONFIG_GBM is defined

This to avoid a build breakage :

../ui/gtk-egl.c: In function ‘gd_egl_draw’:
../ui/gtk-egl.c:73:9: error: unused variable ‘fence_fd’ [-Werror=unused-variable]
73 | int fence_fd;
| ^~~~~~~~

Fixes: fa6426805b12 ("ui/console: Use qemu_dmabuf_set_..() helpers instead")
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240515100520.574383-1-clg@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/intc/s390_flic: Fix crash that occurs when saving the machine state

adapter_info_so_needed() treats its "opaque" parameter as a S390FLICState,
but the function belongs to a VMStateDescription that is attached to a
TYPE_VIRTIO_CCW_BUS device. This is currently causing a crash when the
user tries to save or migrate the VM state. Fix it by using s390_get_flic()
to get the correct device here instead.

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: 9d1b0f5bf5 ("s390_flic: add migration-enabled property")
Message-ID: <20240517061553.564529-1-thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/docker/dockerfiles: Update container files with "lcitool-refresh"

Run "make lcitool-refresh" after the previous changes to the
lcitool files. This removes the g++ and xfslibs-dev packages
from the dockerfiles (except for the fedora-win64-cross dockerfile
where we keep the C++ compiler).

Message-ID: <20240516084059.511463-6-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/lcitool/projects/qemu.yml: Sort entries alphabetically again

Let's try to keep the entries in alphabetical order here!

Message-ID: <20240516084059.511463-5-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/lcitool: Remove g++ from the containers (except for the MinGW one)

We don't need C++ for the normal QEMU builds anymore, so installing
g++ in each and every container seems to be a waste of time and disk
space. The only container that still needs it is the Fedora MinGW
container that builds the only remaining C++ code in ./qga/vss-win32/
and we can install it there with an extra project yml file instead.

Message-ID: <20240516084059.511463-4-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/lcitool: Remove 'xfsprogs' from QEMU

QEMU's commit a5730b8bd3 ("block/file-posix: Simplify the
XFS_IOC_DIOINFO handling") removed the need for the 'xfsprogs'
package.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[thuth: Adjusted the patch from the lcitools repo to QEMU's repo]
Message-ID: <20240516084059.511463-3-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/lcitool/refresh: Treat the output of lcitool as text, not as bytes

In case lcitool fails (e.g. with a python backtrace), this makes
the output of lcitool much more readable.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20240516084059.511463-2-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

backends/iommufd: Make iommufd_backend_*() return bool

This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

The changed functions include:

iommufd_backend_connect
iommufd_backend_alloc_ioas

By this chance, simplify the functions a bit by avoiding duplicate
recordings, e.g., log through either error interface or trace, not
both.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/cpr: Make vfio_cpr_register_container() return bool

This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/iommufd: Make iommufd_cdev_*() return bool

This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

The changed functions include:

iommufd_cdev_kvm_device_add
iommufd_cdev_connect_and_bind
iommufd_cdev_attach_ioas_hwpt
iommufd_cdev_detach_ioas_hwpt
iommufd_cdev_attach_container
iommufd_cdev_get_info_iova_range

After the change, all functions in hw/vfio/iommufd.c follows the
standand.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: Make vfio_get_device() return bool

This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/container: Make vfio_set_iommu() return bool

This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>