Chao Peng [Fri, 27 Oct 2023 18:21:45 +0000 (11:21 -0700)]
KVM: Use gfn instead of hva for mmu_notifier_retry
Currently in mmu_notifier invalidate path, hva range is recorded and then
checked against by mmu_invalidate_retry_hva() in the page fault handling
path. However, for the soon-to-be-introduced private memory, a page fault
may not have a hva associated, checking gfn(gpa) makes more sense.
For existing hva based shared memory, gfn is expected to also work. The
only downside is when aliasing multiple gfns to a single hva, the
current algorithm of checking multiple ranges could result in a much
larger range being rejected. Such aliasing should be uncommon, so the
impact is expected small.
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
[sean: convert vmx_set_apic_access_page_addr() to gfn-based API]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
Message-Id: <
20231027182217.
3615211-4-seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Fri, 27 Oct 2023 18:21:44 +0000 (11:21 -0700)]
KVM: Assert that mmu_invalidate_in_progress *never* goes negative
Move the assertion on the in-progress invalidation count from the primary
MMU's notifier path to KVM's common notification path, i.e. assert that
the count doesn't go negative even when the invalidation is coming from
KVM itself.
Opportunistically convert the assertion to a KVM_BUG_ON(), i.e. kill only
the affected VM, not the entire kernel. A corrupted count is fatal to the
VM, e.g. the non-zero (negative) count will cause mmu_invalidate_retry()
to block any and all attempts to install new mappings. But it's far from
guaranteed that an end() without a start() is fatal or even problematic to
anything other than the target VM, e.g. the underlying bug could simply be
a duplicate call to end(). And it's much more likely that a missed
invalidation, i.e. a potential use-after-free, would manifest as no
notification whatsoever, not an end() without a start().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <
20231027182217.
3615211-3-seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Fri, 27 Oct 2023 18:21:43 +0000 (11:21 -0700)]
KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges
Rework and rename "struct kvm_hva_range" into "kvm_mmu_notifier_range" so
that the structure can be used to handle notifications that operate on gfn
context, i.e. that aren't tied to a host virtual address. Rename the
handler typedef too (arguably it should always have been gfn_handler_t).
Practically speaking, this is a nop for 64-bit kernels as the only
meaningful change is to store start+end as u64s instead of unsigned longs.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <
20231027182217.
3615211-2-seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 8 Nov 2023 09:40:35 +0000 (04:40 -0500)]
selftests: kvm/s390x: use vm_create_barebones()
This function does the same but makes it clearer why one would use
the "____"-prefixed version of vm_create().
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 31 Oct 2023 20:37:07 +0000 (16:37 -0400)]
Merge tag 'kvmarm-6.7' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
- Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
Paolo Bonzini [Tue, 31 Oct 2023 14:22:43 +0000 (10:22 -0400)]
Merge tag 'kvm-x86-svm-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.7:
- Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
running an SEV-ES guest.
- Clean up handling "failures" when KVM detects it can't emulate the "skip"
action for an instruction that has already been partially emulated. Drop a
hack in the SVM code that was fudging around the emulator code not giving
SVM enough information to do the right thing.
Paolo Bonzini [Tue, 31 Oct 2023 14:22:23 +0000 (10:22 -0400)]
Merge tag 'kvm-x86-pmu-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM PMU change for 6.7:
- Handle NMI/SMI requests after PMU/PMI requests so that a PMI=>NMI doesn't
require redoing the entire run loop due to the NMI not being detected until
the final kvm_vcpu_exit_request() check before entering the guest.
Paolo Bonzini [Tue, 31 Oct 2023 14:21:42 +0000 (10:21 -0400)]
Merge tag 'kvm-x86-xen-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 Xen changes for 6.7:
- Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
- Use the fast path directly from the timer callback when delivering Xen timer
events. Avoid the problematic races with using the fast path by ensuring
the hrtimer isn't running when (re)starting the timer or saving the timer
information (for userspace).
- Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
Paolo Bonzini [Tue, 31 Oct 2023 14:17:43 +0000 (10:17 -0400)]
Merge tag 'kvm-x86-mmu-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.7:
- Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
- Zap EPT entries when non-coherent DMA assignment stops/start to prevent
using stale entries with the wrong memtype.
- Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y, as
there's zero reason to ignore guest PAT if the effective MTRR memtype is WB.
This will also allow for future optimizations of handling guest MTRR updates
for VMs with non-coherent DMA and the quirk enabled.
- Harden the fast page fault path to guard against encountering an invalid
root when walking SPTEs.
Paolo Bonzini [Tue, 31 Oct 2023 14:15:15 +0000 (10:15 -0400)]
Merge tag 'kvm-x86-misc-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.7:
- Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
forcing more common use cases to eat the extra memory overhead.
- Add IBPB and SBPB virtualization support.
- Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
creating the original vCPU would cause KVM to try to synchronize the vCPU's
TSC and thus clobber the correct TSC being set by userspace.
- Compute guest wall clock using a single TSC read to avoid generating an
inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
- "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
- Don't apply side effects to Hyper-V's synthetic timer on writes from
userspace to fix an issue where the auto-enable behavior can trigger
spurious interrupts, i.e. do auto-enabling only for guest writes.
- Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
without PML enabled.
- Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
- Use octal notation for file permissions through KVM x86.
- Fix a handful of typo fixes and warts.
Paolo Bonzini [Tue, 31 Oct 2023 14:12:45 +0000 (10:12 -0400)]
Merge tag 'kvm-x86-docs-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 Documentation updates for 6.7:
- Fix various typos, notably a confusing reference to the non-existent
"struct kvm_vcpu_event" (the actual structure is kvm_vcpu_events, plural).
- Update x86's kvm_mmu_page documentation to bring it closer to the code
(this raced with the removal of async zapping and so the documentation is
already stale; my bad).
- Document the behavior of x86 PMU filters on fixed counters.
Paolo Bonzini [Tue, 31 Oct 2023 14:11:19 +0000 (10:11 -0400)]
Merge tag 'kvm-x86-apic-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 APIC changes for 6.7:
- Purge VMX's posted interrupt descriptor *before* loading APIC state when
handling KVM_SET_LAPIC. Purging the PID after loading APIC state results in
lost APIC timer IRQs as the APIC timer can be armed as part of loading APIC
state, i.e. can immediately pend an IRQ if the expiry is in the past.
- Clear the ICR.BUSY bit when handling trap-like x2APIC writes. This avoids a
WARN, due to KVM expecting the BUSY bit to be cleared when sending IPIs.
Paolo Bonzini [Tue, 31 Oct 2023 14:10:15 +0000 (10:10 -0400)]
Merge tag 'kvm-s390-next-6.7-1' of https://git./linux/kernel/git/kvms390/linux into HEAD
- nested page table management performance counters
Paolo Bonzini [Tue, 31 Oct 2023 14:09:39 +0000 (10:09 -0400)]
Merge tag 'kvm-riscv-6.7-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.7
- Smstateen and Zicond support for Guest/VM
- Virtualized senvcfg CSR for Guest/VM
- Added Smstateen registers to the get-reg-list selftests
- Added Zicond to the get-reg-list selftests
- Virtualized SBI debug console (DBCN) for Guest/VM
- Added SBI debug console (DBCN) to the get-reg-list selftests
Paolo Bonzini [Tue, 31 Oct 2023 13:55:40 +0000 (09:55 -0400)]
Merge tag 'loongarch-kvm-6.7' of git://git./linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.7
Add LoongArch's KVM support. Loongson 3A5000/3A6000 supports hardware
assisted virtualization. With cpu virtualization, there are separate
hw-supported user mode and kernel mode in guest mode. With memory
virtualization, there are two-level hw mmu table for guest mode and host
mode. Also there is separate hw cpu timer with consant frequency in
guest mode, so that vm can migrate between hosts with different freq.
Currently, we are able to boot LoongArch Linux Guests.
Few key aspects of KVM LoongArch added by this series are:
1. Enable kvm hardware function when kvm module is loaded.
2. Implement VM and vcpu related ioctl interface such as vcpu create,
vcpu run etc. GET_ONE_REG/SET_ONE_REG ioctl commands are use to
get general registers one by one.
3. Hardware access about MMU, timer and csr are emulated in kernel.
4. Hardwares such as mmio and iocsr device are emulated in user space
such as IPI, irqchips, pci devices etc.
Oliver Upton [Mon, 30 Oct 2023 20:24:07 +0000 (20:24 +0000)]
Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/next
* kvm-arm64/pmu_pmcr_n:
: User-defined PMC limit, courtesy Raghavendra Rao Ananta
:
: Certain VMMs may want to reserve some PMCs for host use while running a
: KVM guest. This was a bit difficult before, as KVM advertised all
: supported counters to the guest. Userspace can now limit the number of
: advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU
: emulation enforce the specified limit for handling guest accesses.
KVM: selftests: aarch64: vPMU test for validating user accesses
KVM: selftests: aarch64: vPMU register test for unimplemented counters
KVM: selftests: aarch64: vPMU register test for implemented counters
KVM: selftests: aarch64: Introduce vpmu_counter_access test
tools: Import arm_pmuv3.h
KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0
KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler
KVM: arm64: PMU: Introduce helpers to set the guest's PMU
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:21:19 +0000 (20:21 +0000)]
Merge branch kvm-arm64/mops into kvmarm/next
* kvm-arm64/mops:
: KVM support for MOPS, courtesy of Kristina Martsenko
:
: MOPS adds new instructions for accelerating memcpy(), memset(), and
: memmove() operations in hardware. This series brings virtualization
: support for KVM guests, and allows VMs to run on asymmetrict systems
: that may have different MOPS implementations.
KVM: arm64: Expose MOPS instructions to guests
KVM: arm64: Add handler for MOPS exceptions
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:21:09 +0000 (20:21 +0000)]
Merge branch kvm-arm64/writable-id-regs into kvmarm/next
* kvm-arm64/writable-id-regs:
: Writable ID registers, courtesy of Jing Zhang
:
: This series significantly expands the architectural feature set that
: userspace can manipulate via the ID registers. A new ioctl is defined
: that makes the mutable fields in the ID registers discoverable to
: userspace.
KVM: selftests: Avoid using forced target for generating arm64 headers
tools headers arm64: Fix references to top srcdir in Makefile
KVM: arm64: selftests: Test for setting ID register from usersapce
tools headers arm64: Update sysreg.h with kernel sources
KVM: selftests: Generate sysreg-defs.h and add to include path
perf build: Generate arm64's sysreg-defs.h and add to include path
tools: arm64: Add a Makefile for generating sysreg-defs.h
KVM: arm64: Document vCPU feature selection UAPIs
KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1
KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1
KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1
KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1
KVM: arm64: Bump up the default KVM sanitised debug version to v8p8
KVM: arm64: Reject attempts to set invalid debug arch version
KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version
KVM: arm64: Use guest ID register values for the sake of emulation
KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS
KVM: arm64: Allow userspace to get the writable masks for feature ID registers
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Fri, 27 Oct 2023 00:54:39 +0000 (00:54 +0000)]
KVM: selftests: Avoid using forced target for generating arm64 headers
The 'prepare' target that generates the arm64 sysreg headers had no
prerequisites, so it wound up forcing a rebuild of all KVM selftests
each invocation. Add a rule for the generated headers and just have
dependents use that for a prerequisite.
Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Fixes: 9697d84cc3b6 ("KVM: selftests: Generate sysreg-defs.h and add to include path")
Tested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Link: https://lore.kernel.org/r/20231027005439.3142015-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Fri, 27 Oct 2023 00:54:38 +0000 (00:54 +0000)]
tools headers arm64: Fix references to top srcdir in Makefile
Aishwarya reports that KVM selftests for arm64 fail with the following
error:
| make[4]: Entering directory '/tmp/kci/linux/tools/testing/selftests/kvm'
| Makefile:270: warning: overriding recipe for target
| '/tmp/kci/linux/build/kselftest/kvm/get-reg-list'
| Makefile:265: warning: ignoring old recipe for target
| '/tmp/kci/linux/build/kselftest/kvm/get-reg-list'
| make -C ../../../../tools/arch/arm64/tools/
| make[5]: Entering directory '/tmp/kci/linux/tools/arch/arm64/tools'
| Makefile:10: ../tools/scripts/Makefile.include: No such file or directory
| make[5]: *** No rule to make target '../tools/scripts/Makefile.include'.
| Stop.
It would appear that this only affects builds from the top-level
Makefile (e.g. make kselftest-all), as $(srctree) is set to ".". Work
around the issue by shadowing the kselftest naming scheme for the source
tree variable.
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Fixes: 0359c946b131 ("tools headers arm64: Update sysreg.h with kernel sources")
Link: https://lore.kernel.org/r/20231027005439.3142015-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:19:13 +0000 (20:19 +0000)]
Merge branch kvm-arm64/sgi-injection into kvmarm/next
* kvm-arm64/sgi-injection:
: vSGI injection improvements + fixes, courtesy Marc Zyngier
:
: Avoid linearly searching for vSGI targets using a compressed MPIDR to
: index a cache. While at it, fix some egregious bugs in KVM's mishandling
: of vcpuid (user-controlled value) and vcpu_idx.
KVM: arm64: Clarify the ordering requirements for vcpu/RD creation
KVM: arm64: vgic-v3: Optimize affinity-based SGI injection
KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available
KVM: arm64: Build MPIDR to vcpu index cache at runtime
KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
KVM: arm64: Use vcpu_idx for invalidation tracking
KVM: arm64: vgic: Use vcpu_idx for the debug information
KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id
KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id
KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:18:56 +0000 (20:18 +0000)]
Merge branch kvm-arm64/stage2-vhe-load into kvmarm/next
* kvm-arm64/stage2-vhe-load:
: Setup stage-2 MMU from vcpu_load() for VHE
:
: Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
: entry/exit in VHE mode as the host is running at EL2. Despite this KVM
: reloads the stage-2 on every guest entry, which is needless.
:
: This series moves the setup of the stage-2 MMU context to vcpu_load()
: when running in VHE mode. This is likely to be a win across the board,
: but also allows us to remove an ISB on the guest entry path for systems
: with one of the speculative AT errata.
KVM: arm64: Move VTCR_EL2 into struct s2_mmu
KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()
KVM: arm64: Rename helpers for VHE vCPU load/put
KVM: arm64: Reload stage-2 for VMID change on VHE
KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host()
KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:18:46 +0000 (20:18 +0000)]
Merge branch kvm-arm64/nv-trap-fixes into kvmarm/next
* kvm-arm64/nv-trap-fixes:
: NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier
:
: - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the
: NV trap encoding
:
: - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI
: where appropriate for NV guests
KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
KVM: arm64: Refine _EL2 system register list that require trap reinjection
arm64: Add missing _EL2 encodings
arm64: Add missing _EL12 encodings
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:18:37 +0000 (20:18 +0000)]
Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/next
* kvm-arm64/smccc-filter-cleanups:
: Cleanup the management of KVM's SMCCC maple tree
:
: Avoid the cost of maintaining the SMCCC filter maple tree if userspace
: hasn't writen a rule to the filter. While at it, rip out the now
: unnecessary VM flag to indicate whether or not the SMCCC filter was
: configured.
KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured
KVM: arm64: Only insert reserved ranges when SMCCC filter is used
KVM: arm64: Add a predicate for testing if SMCCC filter is configured
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:18:23 +0000 (20:18 +0000)]
Merge branch kvm-arm64/pmevtyper-filter into kvmarm/next
* kvm-arm64/pmevtyper-filter:
: Fixes to KVM's handling of the PMUv3 exception level filtering bits
:
: - NSH (count at EL2) and M (count at EL3) should be stateful when the
: respective EL is advertised in the ID registers but have no effect on
: event counting.
:
: - NSU and NSK modify the event filtering of EL0 and EL1, respectively.
: Though the kernel may not use these bits, other KVM guests might.
: Implement these bits exactly as written in the pseudocode if EL3 is
: advertised.
KVM: arm64: Add PMU event filter bits required if EL3 is implemented
KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:18:14 +0000 (20:18 +0000)]
Merge branch kvm-arm64/feature-flag-refactor into kvmarm/next
* kvm-arm64/feature-flag-refactor:
: vCPU feature flag cleanup
:
: Clean up KVM's handling of vCPU feature flags to get rid of the
: vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu().
KVM: arm64: Get rid of vCPU-scoped feature bitmap
KVM: arm64: Remove unused return value from kvm_reset_vcpu()
KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler
KVM: arm64: Prevent NV feature flag on systems w/o nested virt
KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl
KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler
KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler
KVM: arm64: Add generic check for system-supported vCPU features
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Mon, 30 Oct 2023 20:18:00 +0000 (20:18 +0000)]
Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
: Miscellaneous updates
:
: - Put an upper bound on the number of I-cache invalidations by
: cacheline to avoid soft lockups
:
: - Get rid of bogus refererence count transfer for THP mappings
:
: - Do a local TLB invalidation on permission fault race
:
: - Fixes for page_fault_test KVM selftest
:
: - Add a tracepoint for detecting MMIO instructions unsupported by KVM
KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
KVM: arm64: selftest: Perform ISB before reading PAR_EL1
KVM: arm64: selftest: Add the missing .guest_prepare()
KVM: arm64: Always invalidate TLB for stage-2 permission faults
KVM: arm64: Do not transfer page refcount for THP adjustment
KVM: arm64: Avoid soft lockups due to I-cache maintenance
arm64: tlbflush: Rename MAX_TLBI_OPS
KVM: arm64: Don't use kerneldoc comment for arm64_check_features()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Thu, 26 Oct 2023 20:53:06 +0000 (20:53 +0000)]
KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
It is a pretty well known fact that KVM does not support MMIO emulation
without valid instruction syndrome information (ESR_EL2.ISV == 0). The
current kvm_pr_unimpl() is pretty useless, as it contains zero context
to relate the event to a vCPU.
Replace it with a precise tracepoint that dumps the relevant context
so the user can make sense of what the guest is doing.
Acked-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231026205306.3045075-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Zenghui Yu [Sat, 7 Oct 2023 12:40:43 +0000 (20:40 +0800)]
KVM: arm64: selftest: Perform ISB before reading PAR_EL1
It looks like a mistake to issue ISB *after* reading PAR_EL1, we should
instead perform it between the AT instruction and the reads of PAR_EL1.
As according to DDI0487J.a IJTYVP,
"When an address translation instruction is executed, explicit
synchronization is required to guarantee the result is visible to
subsequent direct reads of PAR_EL1."
Otherwise all guest_at testcases fail on my box with
==== Test Assertion Failure ====
aarch64/page_fault_test.c:142: par & 1 == 0
pid=
1355864 tid=
1355864 errno=4 - Interrupted system call
1 0x0000000000402853: vcpu_run_loop at page_fault_test.c:681
2 0x0000000000402cdb: run_test at page_fault_test.c:730
3 0x0000000000403897: for_each_guest_mode at guest_modes.c:100
4 0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1105
5 (inlined by) main at page_fault_test.c:1131
6 0x0000ffffb153c03b: ?? ??:0
7 0x0000ffffb153c113: ?? ??:0
8 0x0000000000401aaf: _start at ??:?
0x1 != 0x0 (par & 1 != 0)
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-2-yuzenghui@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Zenghui Yu [Sat, 7 Oct 2023 12:40:42 +0000 (20:40 +0800)]
KVM: arm64: selftest: Add the missing .guest_prepare()
Running page_fault_test on a Cortex A72 fails with
Test: ro_memslot_no_syndrome_guest_cas
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
Testing memory backing src type: anonymous
==== Test Assertion Failure ====
aarch64/page_fault_test.c:117: guest_check_lse()
pid=
1944087 tid=
1944087 errno=4 - Interrupted system call
1 0x00000000004028b3: vcpu_run_loop at page_fault_test.c:682
2 0x0000000000402d93: run_test at page_fault_test.c:731
3 0x0000000000403957: for_each_guest_mode at guest_modes.c:100
4 0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1108
5 (inlined by) main at page_fault_test.c:1134
6 0x0000ffff868e503b: ?? ??:0
7 0x0000ffff868e5113: ?? ??:0
8 0x0000000000401aaf: _start at ??:?
guest_check_lse()
because we don't have a guest_prepare stage to check the presence of
FEAT_LSE and skip the related guest_cas testing, and we end-up failing in
GUEST_ASSERT(guest_check_lse()).
Add the missing .guest_prepare() where it's indeed required.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-1-yuzenghui@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Oliver Upton [Fri, 22 Sep 2023 22:32:29 +0000 (22:32 +0000)]
KVM: arm64: Always invalidate TLB for stage-2 permission faults
It is possible for multiple vCPUs to fault on the same IPA and attempt
to resolve the fault. One of the page table walks will actually update
the PTE and the rest will return -EAGAIN per our race detection scheme.
KVM elides the TLB invalidation on the racing threads as the return
value is nonzero.
Before commit
a12ab1378a88 ("KVM: arm64: Use local TLBI on permission
relaxation") KVM always used broadcast TLB invalidations when handling
permission faults, which had the convenient property of making the
stage-2 updates visible to all CPUs in the system. However now we do a
local invalidation, and TLBI elision leads to the vCPU thread faulting
again on the stale entry. Remember that the architecture permits the TLB
to cache translations that precipitate a permission fault.
Invalidate the TLB entry responsible for the permission fault if the
stage-2 descriptor has been relaxed, regardless of which thread actually
did the job.
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230922223229.1608155-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Linus Torvalds [Mon, 30 Oct 2023 02:31:08 +0000 (16:31 -1000)]
Linux 6.6
Linus Torvalds [Sat, 28 Oct 2023 18:15:07 +0000 (08:15 -1000)]
Merge tag 'x86-urgent-2023-10-28' of git://git./linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix a possible CPU hotplug deadlock bug caused by the new TSC
synchronization code
- Fix a legacy PIC discovery bug that results in device troubles on
affected systems, such as non-working keybards, etc
- Add a new Intel CPU model number to <asm/intel-family.h>
* tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Defer marking TSC unstable to a worker
x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
x86/cpu: Add model number for Intel Arrow Lake mobile processor
Linus Torvalds [Sat, 28 Oct 2023 18:12:34 +0000 (08:12 -1000)]
Merge tag 'irq-urgent-2023-10-28' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Restore unintentionally lost quirk settings in the GIC irqchip driver,
which broke certain devices"
* tag 'irq-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Don't override quirk settings with default values
Linus Torvalds [Sat, 28 Oct 2023 18:10:47 +0000 (08:10 -1000)]
Merge tag 'perf-urgent-2023-10-28' of git://git./linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix a potential NULL dereference bug"
* tag 'perf-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix potential NULL deref
Linus Torvalds [Sat, 28 Oct 2023 18:04:56 +0000 (08:04 -1000)]
Merge tag 'probes-fixes-v6.6-rc7' of git://git./linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- tracing/kprobes: Fix kernel-doc warnings for the variable length
arguments
- tracing/kprobes: Fix to count the symbols in modules even if the
module name is not specified so that user can probe the symbols in
the modules without module name
* tag 'probes-fixes-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/kprobes: Fix symbol counting logic by looking at modules as well
tracing/kprobes: Fix the description of variable length arguments
Linus Torvalds [Sat, 28 Oct 2023 18:01:31 +0000 (08:01 -1000)]
Merge tag 'dma-mapping-6.6-2023-10-28' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- reduce the initialy dynamic swiotlb size to remove an annoying but
harmless warning from the page allocator (Petr Tesarik)
* tag 'dma-mapping-6.6-2023-10-28' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: do not try to allocate a TLB bigger than MAX_ORDER pages
Linus Torvalds [Sat, 28 Oct 2023 17:51:27 +0000 (07:51 -1000)]
Merge tag 'char-misc-6.6-final' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some very small driver fixes for 6.6-final that have shown up
in the past two weeks. Included in here are:
- tiny fastrpc bugfixes for reported errors
- nvmem register fixes
- iio driver fixes for some reported problems
- fpga test fix
- MAINTAINERS file update for fpga
All of these have been in linux-next this week with no reported
problems"
* tag 'char-misc-6.6-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
fpga: Fix memory leak for fpga_region_test_class_find()
fpga: m10bmc-sec: Change contact for secure update driver
fpga: disable KUnit test suites when module support is enabled
iio: afe: rescale: Accept only offset channels
nvmem: imx: correct nregs for i.MX6ULL
nvmem: imx: correct nregs for i.MX6UL
nvmem: imx: correct nregs for i.MX6SLL
misc: fastrpc: Unmap only if buffer is unmapped from DSP
misc: fastrpc: Clean buffers on remote invocation failures
misc: fastrpc: Free DMA handles for RPC calls with no arguments
misc: fastrpc: Reset metadata buffer to avoid incorrect free
iio: exynos-adc: request second interupt only when touchscreen mode is used
iio: adc: xilinx-xadc: Correct temperature offset/scale for UltraScale
iio: adc: xilinx-xadc: Don't clobber preset voltage/temperature thresholds
dt-bindings: iio: add missing reset-gpios constrain
Linus Torvalds [Sat, 28 Oct 2023 17:48:37 +0000 (07:48 -1000)]
Merge tag 'i2c-for-6.6-rc8' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Bugfixes for Axxia when it is a target and for PEC handling of
stm32f7.
Plus, fix an OF node leak pattern in the mux subsystem"
* tag 'i2c-for-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: stm32f7: Fix PEC handling in case of SMBUS transfers
i2c: muxes: i2c-mux-gpmux: Use of_get_i2c_adapter_by_node()
i2c: muxes: i2c-demux-pinctrl: Use of_get_i2c_adapter_by_node()
i2c: muxes: i2c-mux-pinctrl: Use of_get_i2c_adapter_by_node()
i2c: aspeed: Fix i2c bus hang in slave read
Linus Torvalds [Sat, 28 Oct 2023 02:52:51 +0000 (16:52 -1000)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Three fixes, one for the clk framework and two for clk drivers:
- Avoid an oops in possible_parent_show() by checking for no parent
properly when a DT index based lookup is used
- Handle errors returned from divider_ro_round_rate() in
clk_stm32_composite_determine_rate()
- Fix clk_ops::determine_rate() implementation of socfpga's
gateclk_ops that was ruining uart output because the divider
was forgotten about"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: stm32: Fix a signedness issue in clk_stm32_composite_determine_rate()
clk: Sanitize possible_parent_show to Handle Return Value of of_clk_get_parent_name
clk: socfpga: gate: Account for the divider in determine_rate
Linus Torvalds [Sat, 28 Oct 2023 02:44:58 +0000 (16:44 -1000)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull misc filesystem fixes from Al Viro:
"Assorted fixes all over the place: literally nothing in common, could
have been three separate pull requests.
All are simple regression fixes, but not for anything from this cycle"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lock
io_uring: kiocb_done() should *not* trust ->ki_pos if ->{read,write}_iter() failed
sparc32: fix a braino in fault handling in csum_and_copy_..._user()
Andrii Nakryiko [Fri, 27 Oct 2023 23:31:26 +0000 (16:31 -0700)]
tracing/kprobes: Fix symbol counting logic by looking at modules as well
Recent changes to count number of matching symbols when creating
a kprobe event failed to take into account kernel modules. As such, it
breaks kprobes on kernel module symbols, by assuming there is no match.
Fix this my calling module_kallsyms_on_each_symbol() in addition to
kallsyms_on_each_match_symbol() to perform a proper counting.
Link: https://lore.kernel.org/all/20231027233126.2073148-1-andrii@kernel.org/
Cc: Francis Laniel <flaniel@linux.microsoft.com>
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Fixes: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Al Viro [Fri, 15 Sep 2023 01:55:29 +0000 (21:55 -0400)]
ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lock
Use of dget() after we'd dropped ->d_lock is too late - dentry might
be gone by that point.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 28 Aug 2023 22:47:31 +0000 (18:47 -0400)]
io_uring: kiocb_done() should *not* trust ->ki_pos if ->{read,write}_iter() failed
->ki_pos value is unreliable in such cases. For an obvious example,
consider O_DSYNC write - we feed the data to page cache and start IO,
then we make sure it's completed. Update of ->ki_pos is dealt with
by the first part; failure in the second ends up with negative value
returned _and_ ->ki_pos left advanced as if sync had been successful.
In the same situation write(2) does not advance the file position
at all.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 28 Oct 2023 00:10:32 +0000 (14:10 -1000)]
Merge tag 'io_uring-6.6-2023-10-27' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Fix for an issue reported where reading fdinfo could find a NULL
thread as we didn't properly synchronize, and then a disable for the
IOCB_DIO_CALLER_COMP optimization as a recent reported highlighted how
that could lead to deadlocks if the task issued async O_DIRECT writes
and then proceeded to do sync fallocate() calls"
* tag 'io_uring-6.6-2023-10-27' of git://git.kernel.dk/linux:
io_uring/rw: disable IOCB_DIO_CALLER_COMP
io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid
Al Viro [Sun, 22 Oct 2023 23:34:28 +0000 (19:34 -0400)]
sparc32: fix a braino in fault handling in csum_and_copy_..._user()
Fault handler used to make non-trivial calls, so it needed
to set a stack frame up. Used to be
save ... - grab a stack frame, old %o... become %i...
....
ret - go back to address originally in %o7, currently %i7
restore - switch to previous stack frame, in delay slot
Non-trivial calls had been gone since
ab5e8b331244 and that code should
have become
retl - go back to address in %o7
clr %o0 - have return value set to 0
What it had become instead was
ret - go back to address in %i7 - return address of *caller*
clr %o0 - have return value set to 0
which is not good, to put it mildly - we forcibly return 0 from
csum_and_copy_{from,to}_iter() (which is what the call of that
thing had been inlined into) and do that without dropping the
stack frame of said csum_and_copy_..._iter(). Confuses the
hell out of the caller of csum_and_copy_..._iter(), obviously...
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Fixes: ab5e8b331244 "sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 28 Oct 2023 00:01:59 +0000 (14:01 -1000)]
Merge tag 'block-6.6-2023-10-27' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
"Just a single fix for a potential divide-by-zero, introduced in this
cycle"
* tag 'block-6.6-2023-10-27' of git://git.kernel.dk/linux:
blk-throttle: check for overflow in calculate_bytes_allowed
Linus Torvalds [Fri, 27 Oct 2023 23:38:59 +0000 (13:38 -1000)]
Merge tag 'ata-6.6-final' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single patch to fix a regression introduced by the recent
suspend/resume fixes.
The regression is that ATA disks are not stopped on system shutdown,
which is not recommended and increases the disks SMART counters for
unclean power off events.
This patch fixes this by refining the recent rework of the scsi device
manage_xxx flags"
* tag 'ata-6.6-final' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
scsi: sd: Introduce manage_shutdown device flag
Linus Torvalds [Fri, 27 Oct 2023 23:32:48 +0000 (13:32 -1000)]
Merge tag 'platform-drivers-x86-v6.6-6' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fix from Hans de Goede:
"A single patch to extend the AMD PMC driver DMI quirk list
for laptops which need special handling to avoid NVME s2idle
suspend/resume errors"
* tag 'platform-drivers-x86-v6.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: Add s2idle quirk for more Lenovo laptops
Mingwei Zhang [Mon, 2 Oct 2023 04:08:39 +0000 (04:08 +0000)]
KVM: x86: Service NMI requests after PMI requests in VM-Enter path
Service NMI and SMI requests after PMI requests in vcpu_enter_guest() so
that KVM does not need to cancel and redo the VM-Enter if the guest
configures its PMIs to be delivered as NMIs (likely) or SMIs (unlikely).
Because APIC emulation "injects" NMIs via KVM_REQ_NMI, handling PMI
requests after NMI requests (the likely case) means KVM won't detect the
pending NMI request until the final check for outstanding requests.
Detecting requests at the final stage is costly as KVM has already loaded
guest state, potentially queued events for injection, disabled IRQs,
dropped SRCU, etc., most of which needs to be unwound.
Note that changing the order of request processing doesn't change the end
result, as KVM's final check for outstanding requests prevents entering
the guest until all requests are serviced. I.e. KVM will ultimately
coalesce events (or not) regardless of the ordering.
Using SPEC2017 benchmark programs running along with Intel vtune in a VM
demonstrates that the following code change reduces 800~1500 canceled
VM-Enters per second.
Some glory details:
Probe the invocation to vmx_cancel_injection():
$ perf probe -a vmx_cancel_injection
$ perf stat -a -e probe:vmx_cancel_injection -I 10000 # per 10 seconds
Partial results when SPEC2017 with Intel vtune are running in the VM:
On kernel without the change:
10.
010018010 14254 probe:vmx_cancel_injection
20.
037646388 15207 probe:vmx_cancel_injection
30.
078739816 15261 probe:vmx_cancel_injection
40.
114033258 15085 probe:vmx_cancel_injection
50.
149297460 15112 probe:vmx_cancel_injection
60.
185103088 15104 probe:vmx_cancel_injection
On kernel with the change:
10.
003595390 40 probe:vmx_cancel_injection
20.
017855682 31 probe:vmx_cancel_injection
30.
028355883 34 probe:vmx_cancel_injection
40.
038686298 31 probe:vmx_cancel_injection
50.
048795162 20 probe:vmx_cancel_injection
60.
069057747 19 probe:vmx_cancel_injection
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20231002040839.2630027-1-mizhang@google.com
[sean: hoist PMU/PMI above SMI too, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Thomas Gleixner [Wed, 25 Oct 2023 21:31:35 +0000 (23:31 +0200)]
x86/tsc: Defer marking TSC unstable to a worker
Tetsuo reported the following lockdep splat when the TSC synchronization
fails during CPU hotplug:
tsc: Marking TSC unstable due to check_tsc_sync_source failed
WARNING: inconsistent lock state
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0
{IN-HARDIRQ-W} state was registered at:
_raw_spin_lock_irqsave+0x3f/0x60
clocksource_mark_unstable+0x1b/0x90
mark_tsc_unstable+0x41/0x50
check_tsc_sync_source+0x14f/0x180
sysvec_call_function_single+0x69/0x90
Possible unsafe locking scenario:
lock(watchdog_lock);
<Interrupt>
lock(watchdog_lock);
stack backtrace:
_raw_spin_lock+0x30/0x40
clocksource_watchdog+0x23/0x5a0
run_timer_softirq+0x2a/0x50
sysvec_apic_timer_interrupt+0x6e/0x90
The reason is the recent conversion of the TSC synchronization function
during CPU hotplug on the control CPU to a SMP function call. In case
that the synchronization with the upcoming CPU fails, the TSC has to be
marked unstable via clocksource_mark_unstable().
clocksource_mark_unstable() acquires 'watchdog_lock', but that lock is
taken with interrupts enabled in the watchdog timer callback to minimize
interrupt disabled time. That's obviously a possible deadlock scenario,
Before that change the synchronization function was invoked in thread
context so this could not happen.
As it is not crucical whether the unstable marking happens slightly
delayed, defer the call to a worker thread which avoids the lock context
problem.
Fixes: 9d349d47f0e3 ("x86/smpboot: Make TSC synchronization function call based")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87zg064ceg.ffs@tglx
Thomas Gleixner [Wed, 25 Oct 2023 21:04:15 +0000 (23:04 +0200)]
x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
David and a few others reported that on certain newer systems some legacy
interrupts fail to work correctly.
Debugging revealed that the BIOS of these systems leaves the legacy PIC in
uninitialized state which makes the PIC detection fail and the kernel
switches to a dummy implementation.
Unfortunately this fallback causes quite some code to fail as it depends on
checks for the number of legacy PIC interrupts or the availability of the
real PIC.
In theory there is no reason to use the PIC on any modern system when
IO/APIC is available, but the dependencies on the related checks cannot be
resolved trivially and on short notice. This needs lots of analysis and
rework.
The PIC detection has been added to avoid quirky checks and force selection
of the dummy implementation all over the place, especially in VM guest
scenarios. So it's not an option to revert the relevant commit as that
would break a lot of other scenarios.
One solution would be to try to initialize the PIC on detection fail and
retry the detection, but that puts the burden on everything which does not
have a PIC.
Fortunately the ACPI/MADT table header has a flag field, which advertises
in bit 0 that the system is PCAT compatible, which means it has a legacy
8259 PIC.
Evaluate that bit and if set avoid the detection routine and keep the real
PIC installed, which then gets initialized (for nothing) and makes the rest
of the code with all the dependencies work again.
Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately")
Reported-by: David Lazar <dlazar@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Lazar <dlazar@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003
Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
Tony Luck [Wed, 25 Oct 2023 20:25:13 +0000 (13:25 -0700)]
x86/cpu: Add model number for Intel Arrow Lake mobile processor
For "reasons" Intel has code-named this CPU with a "_H" suffix.
[ dhansen: As usual, apply this and send it upstream quickly to
make it easier for anyone who is doing work that
consumes this. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20231025202513.12358-1-tony.luck%40intel.com
Linus Torvalds [Fri, 27 Oct 2023 15:43:05 +0000 (05:43 -1000)]
Merge tag 'iommu-fix-v6.6-rc7' of git://git./linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
- Fix boot regression for Sapphire Rapids with Intel VT-d driver
* tag 'iommu-fix-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Avoid unnecessary cache invalidations
Linus Torvalds [Fri, 27 Oct 2023 15:40:42 +0000 (05:40 -1000)]
Merge tag 'powerpc-6.6-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix boot crash with FLATMEM since set_ptes() introduction
- Avoid calling arch_enter/leave_lazy_mmu() in set_ptes()
Thanks to Aneesh Kumar K.V and Erhard Furtner.
* tag 'powerpc-6.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes
powerpc/mm: Fix boot crash with FLATMEM
David Lazar [Wed, 25 Oct 2023 19:30:16 +0000 (21:30 +0200)]
platform/x86: Add s2idle quirk for more Lenovo laptops
When suspending to idle and resuming on some Lenovo laptops using the
Mendocino APU, multiple NVME IOMMU page faults occur, showing up in
dmesg as repeated errors:
nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b
address=0xb6674000 flags=0x0000]
The system is unstable afterwards.
Applying the s2idle quirk introduced by commit
455cd867b85b ("platform/x86:
thinkpad_acpi: Add a s2idle resume quirk for a number of laptops")
allows these systems to work with the IOMMU enabled and s2idle
resume to work.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218024
Suggested-by: Mario Limonciello <mario.limonciello@amd.com>
Suggested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Signed-off-by: David Lazar <dlazar@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/ZTlsyOaFucF2pWrL@localhost
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Yujie Liu [Fri, 27 Oct 2023 04:13:14 +0000 (12:13 +0800)]
tracing/kprobes: Fix the description of variable length arguments
Fix the following kernel-doc warnings:
kernel/trace/trace_kprobe.c:1029: warning: Excess function parameter 'args' description in '__kprobe_event_gen_cmd_start'
kernel/trace/trace_kprobe.c:1097: warning: Excess function parameter 'args' description in '__kprobe_event_add_fields'
Refer to the usage of variable length arguments elsewhere in the kernel
code, "@..." is the proper way to express it in the description.
Link: https://lore.kernel.org/all/20231027041315.2613166-1-yujie.liu@intel.com/
Fixes: 2a588dd1d5d6 ("tracing: Add kprobe event command generation functions")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310190437.paI6LYJF-lkp@intel.com/
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Lu Baolu [Thu, 26 Oct 2023 08:49:42 +0000 (16:49 +0800)]
iommu: Avoid unnecessary cache invalidations
The iommu_create_device_direct_mappings() only needs to flush the caches
when the mappings are changed in the affected domain. This is not true
for non-DMA domains, or for devices attached to the domain that have no
reserved regions. To avoid unnecessary cache invalidations, add a check
before iommu_flush_iotlb_all().
Fixes: a48ce36e2786 ("iommu: Prevent RESV_DIRECT devices from blocking domains")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Henry Willard <henry.willard@oracle.com>
Link: https://lore.kernel.org/r/20231026084942.17387-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Linus Torvalds [Fri, 27 Oct 2023 06:42:02 +0000 (20:42 -1000)]
Merge tag 'drm-fixes-2023-10-27' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This is the final set of fixes for 6.6, just misc bits mainly in
amdgpu and i915, nothing too noteworthy.
amdgpu:
- ignore duplicated BOs in CS parser
- remove redundant call to amdgpu_ctx_priority_is_valid()
- Extend VI APSM quirks to more platforms
amdkfd:
- reserve fence slot while locking BO
dp_mst:
- Fix NULL deref in get_mst_branch_device_by_guid_helper()
logicvc:
- Kconfig: Select REGMAP and REGMAP_MMIO
ivpu:
- Fix missing VPUIP interrupts
i915:
- Determine context valid in OA reports
- Hold GT forcewake during steering operations
- Check if PMU is closed before stopping event"
* tag 'drm-fixes-2023-10-27' of git://anongit.freedesktop.org/drm/drm:
accel/ivpu/37xx: Fix missing VPUIP interrupts
drm/amd: Disable ASPM for VI w/ all Intel systems
drm/i915/pmu: Check if pmu is closed before stopping event
drm/i915/mcr: Hold GT forcewake during steering operations
drm/logicvc: Kconfig: select REGMAP and REGMAP_MMIO
drm/i915/perf: Determine context valid in OA reports
drm/amdkfd: reserve a fence slot while locking the BO
drm/amdgpu: Remove redundant call to priority_is_valid()
drm/dp_mst: Fix NULL deref in get_mst_branch_device_by_guid_helper()
drm/amdgpu: ignore duplicate BOs again
Dave Airlie [Fri, 27 Oct 2023 02:13:29 +0000 (12:13 +1000)]
Merge tag 'amd-drm-fixes-6.6-2023-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.6-2023-10-25:
amdgpu:
- Extend VI APSM quirks to more platforms
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026035452.14921-1-alexander.deucher@amd.com
Dave Airlie [Fri, 27 Oct 2023 01:58:28 +0000 (11:58 +1000)]
Merge tag 'drm-intel-fixes-2023-10-26' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Determine context valid in OA reports (Umesh)
- Hold GT forcewake during steering operations (Matt Roper)
- Check if PMU is closed before stopping event (Umesh)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZTp8IQ0wxzxVjN7J@intel.com
Dave Airlie [Fri, 27 Oct 2023 01:50:51 +0000 (11:50 +1000)]
Merge tag 'drm-misc-fixes-2023-10-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:
amdgpu:
- ignore duplicated BOs in CS parser
- remove redundant call to amdgpu_ctx_priority_is_valid()
amdkfd:
- reserve fence slot while locking BO
dp_mst:
- Fix NULL deref in get_mst_branch_device_by_guid_helper()
logicvc:
- Kconfig: Select REGMAP and REGMAP_MMIO
ivpu:
- Fix missing VPUIP interrupts
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026110132.GA10591@linux-uq9g.fritz.box
Damien Le Moal [Wed, 25 Oct 2023 06:46:12 +0000 (15:46 +0900)]
scsi: sd: Introduce manage_shutdown device flag
Commit
aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") change setting the manage_system_start_stop
flag to false for libata managed disks to enable libata internal
management of disk suspend/resume. However, a side effect of this change
is that on system shutdown, disks are no longer being stopped (set to
standby mode with the heads unloaded). While this is not a critical
issue, this unclean shutdown is not recommended and shows up with
increased smart counters (e.g. the unexpected power loss counter
"Unexpect_Power_Loss_Ct").
Instead of defining a shutdown driver method for all ATA adapter
drivers (not all of them define that operation), this patch resolves
this issue by further refining the sd driver start/stop control of disks
using the new flag manage_shutdown. If this new flag is set to true by
a low level driver, the function sd_shutdown() will issue a
START STOP UNIT command with the start argument set to 0 when a disk
needs to be powered off (suspended) on system power off, that is, when
system_state is equal to SYSTEM_POWER_OFF.
Similarly to the other manage_xxx flags, the new manage_shutdown flag is
exposed through sysfs as a read-write device attribute.
To avoid any confusion between manage_shutdown and
manage_system_start_stop, the comments describing these flags in
include/scsi/scsi.h are also improved.
Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218038
Link: https://lore.kernel.org/all/cd397c88-bf53-4768-9ab8-9d107df9e613@gmail.com/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Thu, 26 Oct 2023 18:17:26 +0000 (08:17 -1000)]
Merge tag 'soc-fixes-6.7-3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"A couple of platforms have some last-minute fixes, in particular:
- riscv gets some fixes for noncoherent DMA on the renesas and thead
platforms and dts fix for SPI on the visionfive 2 board
- Qualcomm Snapdragon gets three dts fixes to address board specific
regressions on the pmic and gpio nodes
- Rockchip platforms get multiple dts fixes to address issues on the
recent rk3399 platform as well as the older rk3128 platform that
apparently regressed a while ago.
- TI OMAP gets some trivial code and dts fixes and a regression fix
for the omap1 ams-delta modem
- NXP i.MX firmware has one fix for a use-after-free but in its error
handling"
* tag 'soc-fixes-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
soc: renesas: ARCH_R9A07G043 depends on !RISCV_ISA_ZICBOM
riscv: only select DMA_DIRECT_REMAP from RISCV_ISA_ZICBOM and ERRATA_THEAD_PBMT
riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT
riscv: dts: thead: set dma-noncoherent to soc bus
arm64: dts: rockchip: Fix i2s0 pin conflict on ROCK Pi 4 boards
arm64: dts: rockchip: Add i2s0-2ch-bus-bclk-off pins to RK3399
clk: ti: Fix missing omap5 mcbsp functional clock and aliases
clk: ti: Fix missing omap4 mcbsp functional clock and aliases
ARM: OMAP1: ams-delta: Fix MODEM initialization failure
soc: renesas: Make ARCH_R9A07G043 depend on required options
riscv: dts: starfive: visionfive 2: correct spi's ss pin
firmware/imx-dsp: Fix use_after_free in imx_dsp_setup_channels()
ARM: OMAP: timer32K: fix all kernel-doc warnings
ARM: omap2: fix a debug printk
ARM: dts: rockchip: Fix timer clocks for RK3128
ARM: dts: rockchip: Add missing quirk for RK3128's dma engine
ARM: dts: rockchip: Add missing arm timer interrupt for RK3128
ARM: dts: rockchip: Fix i2c0 register address for RK3128
arm64: dts: rockchip: set codec system-clock-fixed on px30-ringneck-haikou
arm64: dts: rockchip: use codec as clock master on px30-ringneck-haikou
...
Linus Torvalds [Thu, 26 Oct 2023 17:41:27 +0000 (07:41 -1000)]
Merge tag 'net-6.6-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi and netfilter.
Most regressions addressed here come from quite old versions, with the
exceptions of the iavf one and the WiFi fixes. No known outstanding
reports or investigation.
Fixes to fixes:
- eth: iavf: in iavf_down, disable queues when removing the driver
Previous releases - regressions:
- sched: act_ct: additional checks for outdated flows
- tcp: do not leave an empty skb in write queue
- tcp: fix wrong RTO timeout when received SACK reneging
- wifi: cfg80211: pass correct pointer to rdev_inform_bss()
- eth: i40e: sync next_to_clean and next_to_process for programming
status desc
- eth: iavf: initialize waitqueues before starting watchdog_task
Previous releases - always broken:
- eth: r8169: fix data-races
- eth: igb: fix potential memory leak in igb_add_ethtool_nfc_entry
- eth: r8152: avoid writing garbage to the adapter's registers
- eth: gtp: fix fragmentation needed check with gso"
* tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
iavf: in iavf_down, disable queues when removing the driver
vsock/virtio: initialize the_virtio_vsock before using VQs
net: ipv6: fix typo in comments
net: ipv4: fix typo in comments
net/sched: act_ct: additional checks for outdated flows
netfilter: flowtable: GC pushes back packets to classic path
i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR
gtp: fix fragmentation needed check with gso
gtp: uapi: fix GTPA_MAX
Fix NULL pointer dereference in cn_filter()
sfc: cleanup and reduce netlink error messages
net/handshake: fix file ref count in handshake_nl_accept_doit()
wifi: mac80211: don't drop all unprotected public action frames
wifi: cfg80211: fix assoc response warning on failed links
wifi: cfg80211: pass correct pointer to rdev_inform_bss()
isdn: mISDN: hfcsusb: Spelling fix in comment
tcp: fix wrong RTO timeout when received SACK reneging
r8152: Block future register access if register access fails
r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE
r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()
...
Arnd Bergmann [Thu, 26 Oct 2023 15:06:37 +0000 (17:06 +0200)]
Merge tag 'renesas-fixes-for-v6.6-tag3' of git://git./linux/kernel/git/geert/renesas-devel into arm/fixes
Renesas fixes for v6.6 (take three)
- Sort out a few Kconfig dependency issues for the rich set of RISC-V
non-coherent DMA support.
* tag 'renesas-fixes-for-v6.6-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
soc: renesas: ARCH_R9A07G043 depends on !RISCV_ISA_ZICBOM
riscv: only select DMA_DIRECT_REMAP from RISCV_ISA_ZICBOM and ERRATA_THEAD_PBMT
riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT
Link: https://lore.kernel.org/r/cover.1698312384.git.geert+renesas@glider.be
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Christoph Hellwig [Wed, 18 Oct 2023 05:26:54 +0000 (07:26 +0200)]
soc: renesas: ARCH_R9A07G043 depends on !RISCV_ISA_ZICBOM
ARCH_R9A07G043 has its own non-standard global pool based DMA coherent
allocator, which conflicts with the remap based RISCV_ISA_ZICBOM version.
Add a proper dependency.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20231018052654.50074-4-hch@lst.de
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Christoph Hellwig [Wed, 18 Oct 2023 05:26:53 +0000 (07:26 +0200)]
riscv: only select DMA_DIRECT_REMAP from RISCV_ISA_ZICBOM and ERRATA_THEAD_PBMT
RISCV_DMA_NONCOHERENT is also used for whacky non-standard
non-coherent ops that use different hooks in dma-direct.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231018052654.50074-3-hch@lst.de
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Christoph Hellwig [Wed, 18 Oct 2023 05:26:52 +0000 (07:26 +0200)]
riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT
RISCV_NONSTANDARD_CACHE_OPS is also used for the pmem cache maintenance
helpers, which are built into the kernel unconditionally.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231018052654.50074-2-hch@lst.de
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Karol Wachowski [Tue, 24 Oct 2023 16:19:52 +0000 (18:19 +0200)]
accel/ivpu/37xx: Fix missing VPUIP interrupts
Move sequence of masking and unmasking global interrupts from buttress
interrupt handler to generic one that handles both VPUIP and BTRS
interrupts. Unmasking global interrupts will re-trigger MSI for any
pending interrupts.
Lack of this sequence will cause the driver to miss any
VPUIP interrupt that comes after reading VPU_37XX_HOST_SS_ICB_STATUS_0
and before clearing all active interrupt sources.
Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
Cc: stable@vger.kernel.org
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231024161952.759914-1-stanislaw.gruszka@linux.intel.com
Michal Schmidt [Wed, 25 Oct 2023 18:32:13 +0000 (11:32 -0700)]
iavf: in iavf_down, disable queues when removing the driver
In iavf_down, we're skipping the scheduling of certain operations if
the driver is being removed. However, the IAVF_FLAG_AQ_DISABLE_QUEUES
request must not be skipped in this case, because iavf_close waits
for the transition to the __IAVF_DOWN state, which happens in
iavf_virtchnl_completion after the queues are released.
Without this fix, "rmmod iavf" takes half a second per interface that's
up and prints the "Device resources not yet released" warning.
Fixes: c8de44b577eb ("iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is set")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231025183213.874283-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 25 Oct 2023 23:02:06 +0000 (16:02 -0700)]
Merge tag 'nf-23-10-25' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This patch contains two late Netfilter's flowtable fixes for net:
1) Flowtable GC pushes back packets to classic path in every GC run,
ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only
used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset
by the time the flow is offloaded (this status bit is only reliable
in the sched/act_ct datapath).
2) sched/act_ct logic to push back packets to classic path to reevaluate
if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is
set on and no hardware offload request is pending to be handled.
From Vlad Buslov.
These two patches fixes two problems that were introduced in the
previous 6.5 development cycle.
* tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
net/sched: act_ct: additional checks for outdated flows
netfilter: flowtable: GC pushes back packets to classic path
====================
Link: https://lore.kernel.org/r/20231025100819.2664-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexandru Matei [Tue, 24 Oct 2023 19:17:42 +0000 (22:17 +0300)]
vsock/virtio: initialize the_virtio_vsock before using VQs
Once VQs are filled with empty buffers and we kick the host, it can send
connection requests. If the_virtio_vsock is not initialized before,
replies are silently dropped and do not reach the host.
virtio_transport_send_pkt() can queue packets once the_virtio_vsock is
set, but they won't be processed until vsock->tx_run is set to true. We
queue vsock->send_pkt_work when initialization finishes to send those
packets queued earlier.
Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20231024191742.14259-1-alexandru.matei@uipath.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Zyngier [Tue, 24 Oct 2023 14:34:31 +0000 (15:34 +0100)]
irqchip/gic-v3-its: Don't override quirk settings with default values
When splitting the allocation of the ITS node from its configuration,
some of the default settings were kept in the latter instead of
being moved to the former.
This has the side effect of negating some of the quirk detections that
have happened in between, amongst which the dreaded Synquacer hack
(that also affect Dominic's TI platform).
Move the initialisation of these fields early, so that they can again be
overriden by the Synquacer quirk.
Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Reported by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Link: https://lore.kernel.org/r/20231024084831.GA3788@JADEVM-DRA
Link: https://lore.kernel.org/r/20231024143431.2144579-1-maz@kernel.org
Linus Torvalds [Wed, 25 Oct 2023 17:51:56 +0000 (07:51 -1000)]
Merge tag 'acpi-6.6-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Unbreak the ACPI NFIT driver after a recent change that inadvertently
altered its behavior (Xiang Chen)"
* tag 'acpi-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: NFIT: Install Notify() handler before getting NFIT table
Petr Tesarik [Wed, 25 Oct 2023 08:44:25 +0000 (10:44 +0200)]
swiotlb: do not try to allocate a TLB bigger than MAX_ORDER pages
When allocating a new pool at runtime, reduce the number of slabs so
that the allocation order is at most MAX_ORDER. This avoids a kernel
warning in __alloc_pages().
The warning is relatively benign, because the pool size is subsequently
reduced when allocation fails, but it is silly to start with a request
that is known to fail, especially since this is the default behavior if
the kernel is built with CONFIG_SWIOTLB_DYNAMIC=y and booted without any
swiotlb= parameter.
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/4f173dd2-324a-0240-ff8d-abf5c191be18@candelatech.com/
Fixes: 1aaa736815eb ("swiotlb: allocate a new memory pool when existing pools are full")
Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Jens Axboe [Tue, 24 Oct 2023 20:39:06 +0000 (14:39 -0600)]
io_uring/rw: disable IOCB_DIO_CALLER_COMP
If an application does O_DIRECT writes with io_uring and the file system
supports IOCB_DIO_CALLER_COMP, then completions of the dio write side is
done from the task_work that will post the completion event for said
write as well.
Whenever a dio write is done against a file, the inode i_dio_count is
elevated. This enables other callers to use inode_dio_wait() to wait for
previous writes to complete. If we defer the full dio completion to
task_work, we are dependent on that task_work being run before the
inode i_dio_count can be decremented.
If the same task that issues io_uring dio writes with
IOCB_DIO_CALLER_COMP performs a synchronous system call that calls
inode_dio_wait(), then we can deadlock as we're blocked sleeping on
the event to become true, but not processing the completions that will
result in the inode i_dio_count being decremented.
Until we can guarantee that this is the case, then disable the deferred
caller completions.
Fixes: 099ada2c8726 ("io_uring/rw: add write support for IOCB_DIO_CALLER_COMP")
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mario Limonciello [Fri, 20 Oct 2023 15:26:29 +0000 (10:26 -0500)]
drm/amd: Disable ASPM for VI w/ all Intel systems
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.
Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.
Cc: stable@vger.kernel.org # 5.15+
Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jens Axboe [Sat, 21 Oct 2023 18:30:29 +0000 (12:30 -0600)]
io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer
dereference when the thread is cleared. Grab the SQPOLL data lock before
attempting to get the task cpu and pid for fdinfo, this ensures we have a
stable view of it.
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Umesh Nerlige Ramappa [Fri, 20 Oct 2023 15:24:41 +0000 (08:24 -0700)]
drm/i915/pmu: Check if pmu is closed before stopping event
When the driver unbinds, pmu is unregistered and i915->uabi_engines is
set to RB_ROOT. Due to this, when i915 PMU tries to stop the engine
events, it issues a warn_on because engine lookup fails.
All perf hooks are taking care of this using a pmu->closed flag that is
set when PMU unregisters. The stop event seems to have been left out.
Check for pmu->closed in pmu_event_stop as well.
Based on discussion here -
https://patchwork.freedesktop.org/patch/492079/?series=105790&rev=2
v2: s/is/if/ in commit title
v3: Add fixes tag and cc stable
Cc: <stable@vger.kernel.org> # v5.11+
Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020152441.3764850-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit
31f6a06f0c543b43a38fab10f39e5fc45ad62aa2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Thu, 19 Oct 2023 17:02:42 +0000 (10:02 -0700)]
drm/i915/mcr: Hold GT forcewake during steering operations
The steering control and semaphore registers are inside an "always on"
power domain with respect to RC6. However there are some issues if
higher-level platform sleep states are entering/exiting at the same time
these registers are accessed. Grabbing GT forcewake and holding it over
the entire lock/steer/unlock cycle ensures that those sleep states have
been fully exited before we access these registers.
This is expected to become a formally documented/numbered workaround
soon.
Note that this patch alone isn't expected to have an immediately
noticeable impact on MCR (mis)behavior; an upcoming pcode firmware
update will also be necessary to provide the other half of this
workaround.
v2:
- Move the forcewake inside the Xe_LPG-specific IP version check. This
should only be necessary on platforms that have a steering semaphore.
Fixes: 3100240bf846 ("drm/i915/mtl: Add hardware-level lock for steering")
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231019170241.2102037-2-matthew.d.roper@intel.com
(cherry picked from commit
8fa1c7cd1fe9cdfc426a603e1f1eecd3f463c487)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Sui Jingfeng [Thu, 8 Jun 2023 02:42:07 +0000 (10:42 +0800)]
drm/logicvc: Kconfig: select REGMAP and REGMAP_MMIO
drm/logicvc driver is depend on REGMAP and REGMAP_MMIO, should select this
two kconfig option, otherwise the driver failed to compile on platform
without REGMAP_MMIO selected:
ERROR: modpost: "__devm_regmap_init_mmio_clk" [drivers/gpu/drm/logicvc/logicvc-drm.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:136: Module.symvers] Error 1
make: *** [Makefile:1978: modpost] Error 2
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Acked-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Fixes: efeeaefe9be5 ("drm: Add support for the LogiCVC display controller")
Link: https://patchwork.freedesktop.org/patch/msgid/20230608024207.581401-1-suijingfeng@loongson.cn
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Deming Wang [Wed, 25 Oct 2023 06:16:56 +0000 (02:16 -0400)]
net: ipv6: fix typo in comments
The word "advertize" should be replaced by "advertise".
Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deming Wang [Wed, 25 Oct 2023 06:14:34 +0000 (02:14 -0400)]
net: ipv4: fix typo in comments
The word "advertize" should be replaced by "advertise".
Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Tue, 24 Oct 2023 19:58:57 +0000 (21:58 +0200)]
net/sched: act_ct: additional checks for outdated flows
Current nf_flow_is_outdated() implementation considers any flow table flow
which state diverged from its underlying CT connection status for teardown
which can be problematic in the following cases:
- Flow has never been offloaded to hardware in the first place either
because flow table has hardware offload disabled (flag
NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
workqueue to be offloaded for the first time. The former is incorrect, the
later generates excessive deletions and additions of flows.
- Flow is already pending to be updated on the workqueue. Tearing down such
flows will also generate excessive removals from the flow table, especially
on highly loaded system where the latency to re-offload a flow via 'add'
workqueue can be quite high.
When considering a flow for teardown as outdated verify that it is both
offloaded to hardware and doesn't have any pending updates.
Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 24 Oct 2023 19:09:47 +0000 (21:09 +0200)]
netfilter: flowtable: GC pushes back packets to classic path
Since
41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded
unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY
back to classic path in every run, ie. every second. This is because of
a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct.
In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on
and IPS_SEEN_REPLY is unreliable since users decide when to offload the
flow before, such bit might be set on at a later stage.
Fix it by adding a custom .gc handler that sched/act_ct can use to
deal with its NF_FLOW_HW_ESTABLISHED bit.
Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
Reported-by: Vladimir Smelhaus <vl.sm@email.cz>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Aneesh Kumar K.V [Tue, 24 Oct 2023 14:36:04 +0000 (20:06 +0530)]
powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes
With commit
9fee28baa601 ("powerpc: implement the new page table range
API") we added set_ptes to powerpc architecture. The implementation
included calling arch_enter/leave_lazy_mmu() calls.
The patch removes the usage of arch_enter/leave_lazy_mmu() because
set_pte is not supposed to be used when updating a pte entry. Powerpc
architecture uses this rule to skip the expensive tlb invalidate which
is not needed when you are setting up the pte for the first time. See
commit
56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
_PAGE_NUMA bit") for more details
The patch also makes sure we are not using the interface to update a
valid/present pte entry by adding VM_WARN_ON check all the ptes we
are setting up. Furthermore, we add a comment to set_pte_filter to
clarify it can only update folio-related flags and cannot filter
pfn specific details in pte filtering.
Removal of arch_enter/leave_lazy_mmu() also will avoid nesting of
these functions that are not supported. For ex:
remap_pte_range()
-> arch_enter_lazy_mmu()
-> set_ptes()
-> arch_enter_lazy_mmu()
-> arch_leave_lazy_mmu()
-> arch_leave_lazy_mmu()
Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231024143604.16749-1-aneesh.kumar@linux.ibm.com
Marc Zyngier [Mon, 23 Oct 2023 09:54:44 +0000 (10:54 +0100)]
KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
When trapping accesses from a NV guest that tries to access
SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI,
as if AArch32 wasn't implemented.
This involves a bit of repainting to make the visibility
handler more generic.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier [Mon, 23 Oct 2023 09:54:43 +0000 (10:54 +0100)]
KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
DBGVCR32_EL2, DACR32_EL2, IFSR32_EL2 and FPEXC32_EL2 are required to
UNDEF when AArch32 isn't implemented, which is definitely the case when
running NV.
Given that this is the only case where these registers can trap,
unconditionally inject an UNDEF exception.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20231023095444.1587322-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Miguel Luis [Mon, 23 Oct 2023 09:54:42 +0000 (10:54 +0100)]
KVM: arm64: Refine _EL2 system register list that require trap reinjection
Implement a fine grained approach in the _EL2 sysreg range instead of
the current wide cast trap. This ensures that we don't mistakenly
inject the wrong exception into the guest.
[maz: commit message massaging, dropped secure and AArch32 registers
from the list]
Fixes: d0fc0a2519a6 ("KVM: arm64: nv: Add trap forwarding for HCR_EL2")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Miguel Luis [Mon, 23 Oct 2023 09:54:41 +0000 (10:54 +0100)]
arm64: Add missing _EL2 encodings
Some _EL2 encodings are missing. Add them.
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[maz: dropped secure encodings]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Miguel Luis [Mon, 23 Oct 2023 09:54:40 +0000 (10:54 +0100)]
arm64: Add missing _EL12 encodings
Some _EL12 encodings are missing. Add them.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Ivan Vecera [Mon, 23 Oct 2023 21:27:14 +0000 (14:27 -0700)]
i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR
The I40E_TXR_FLAGS_WB_ON_ITR is i40e_ring flag and not i40e_pf one.
Fixes: 8e0764b4d6be42 ("i40e/i40evf: Add support for writeback on ITR feature for X722")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231023212714.178032-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raghavendra Rao Ananta [Fri, 20 Oct 2023 21:40:52 +0000 (21:40 +0000)]
KVM: selftests: aarch64: vPMU test for validating user accesses
Add a vPMU test scenario to validate the userspace accesses for
the registers PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} to ensure
that KVM honors the architectural definitions of these registers
for a given PMCR.N.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-13-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reiji Watanabe [Fri, 20 Oct 2023 21:40:51 +0000 (21:40 +0000)]
KVM: selftests: aarch64: vPMU register test for unimplemented counters
Add a new test case to the vpmu_counter_access test to check
if PMU registers or their bits for unimplemented counters are not
accessible or are RAZ, as expected.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-12-rananta@google.com
[Oliver: fix issues relating to exception return address]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reiji Watanabe [Fri, 20 Oct 2023 21:40:50 +0000 (21:40 +0000)]
KVM: selftests: aarch64: vPMU register test for implemented counters
Add a new test case to the vpmu_counter_access test to check if PMU
registers or their bits for implemented counters on the vCPU are
readable/writable as expected, and can be programmed to count events.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-11-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reiji Watanabe [Fri, 20 Oct 2023 21:40:49 +0000 (21:40 +0000)]
KVM: selftests: aarch64: Introduce vpmu_counter_access test
Introduce vpmu_counter_access test for arm64 platforms.
The test configures PMUv3 for a vCPU, sets PMCR_EL0.N for the vCPU,
and check if the guest can consistently see the same number of the
PMU event counters (PMCR_EL0.N) that userspace sets.
This test case is done with each of the PMCR_EL0.N values from
0 to 31 (With the PMCR_EL0.N values greater than the host value,
the test expects KVM_SET_ONE_REG for the PMCR_EL0 to fail).
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-10-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Raghavendra Rao Ananta [Fri, 20 Oct 2023 21:40:48 +0000 (21:40 +0000)]
tools: Import arm_pmuv3.h
Import kernel's include/linux/perf/arm_pmuv3.h, with the
definition of PMEVN_SWITCH() additionally including an assert()
for the 'default' case. The following patches will use macros
defined in this header.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-9-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reiji Watanabe [Fri, 20 Oct 2023 21:40:47 +0000 (21:40 +0000)]
KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
KVM does not yet support userspace modifying PMCR_EL0.N (With
the previous patch, KVM ignores what is written by userspace).
Add support userspace limiting PMCR_EL0.N.
Disallow userspace to set PMCR_EL0.N to a value that is greater
than the host value as KVM doesn't support more event counters
than what the host HW implements. Also, make this register
immutable after the VM has started running. To maintain the
existing expectations, instead of returning an error, KVM
returns a success for these two cases.
Finally, ignore writes to read-only bits that are cleared on
vCPU reset, and RES{0,1} bits (including writable bits that
KVM doesn't support yet), as those bits shouldn't be modified
(at least with the current KVM).
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20231020214053.2144305-8-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Raghavendra Rao Ananta [Fri, 20 Oct 2023 21:40:46 +0000 (21:40 +0000)]
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
For unimplemented counters, the registers PM{C,I}NTEN{SET,CLR}
and PMOVS{SET,CLR} are expected to have the corresponding bits RAZ.
Hence to ensure correct KVM's PMU emulation, mask out the RES0 bits.
Defer this work to the point that userspace can no longer change the
number of advertised PMCs.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-7-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>