git.maquefel.me Git - linux.git/log

Merge tag 'tegra-for-6.7-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers

soc/tegra: Changes for v6.7-rc1

This contains a few minor cleanups for PMC and CBB.

* tag 'tegra-for-6.7-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: pmc: Drop the ->opp_to_performance_state() callback
soc/tegra: cbb: tegra194-cbb: Convert to platform remove callback returning void

Link: https://lore.kernel.org/r/20231013153723.1729109-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'ffa-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers

Arm FF-A updates for v6.7

The main addition is the initial support for the notifications and
memory transaction descriptor changes added in FF-A v1.1 specification.

The notification mechanism enables a requester/sender endpoint to notify
a service provider/receiver endpoint about an event with non-blocking
semantics. A notification is akin to the doorbell between two endpoints
in a communication protocol that is based upon the doorbell/mailbox
mechanism.

The framework is responsible for the delivery of the notification from
the ender to the receiver without blocking the sender. The receiver
endpoint relies on the OS scheduler for allocation of CPU cycles to
handle a notification.

OS is referred as the receiver’s scheduler in the context of notifications.
The framework is responsible for informing the receiver’s scheduler that
the receiver must be run since it has a pending notification.

The series also includes support for the new format of memory transaction
descriptors introduced in v1.1 specification.

Apart from the main additions, it includes minor fixes to re-enable FF-A
drivers usage of 32bit mode of messaging and kernel warning due to the
missing assignment of IDR allocation ID to the FFA device. It also adds
emitting 'modalias' to the base attribute of FF-A devices.

* tag 'ffa-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_ffa: Upgrade the driver version to v1.1
  firmware: arm_ffa: Update memory descriptor to support v1.1 format
  firmware: arm_ffa: Switch to using ffa_mem_desc_offset() accessor
  KVM: arm64: FFA: Remove access of endpoint memory access descriptor array
  firmware: arm_ffa: Simplify the computation of transmit and fragment length
  firmware: arm_ffa: Add notification handling mechanism
  firmware: arm_ffa: Add interface to send a notification to a given partition
  firmware: arm_ffa: Add interfaces to request notification callbacks
  firmware: arm_ffa: Add schedule receiver callback mechanism
  firmware: arm_ffa: Initial support for scheduler receiver interrupt
  firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface
  firmware: arm_ffa: Implement the FFA_NOTIFICATION_GET interface
  firmware: arm_ffa: Implement the FFA_NOTIFICATION_SET interface
  firmware: arm_ffa: Implement the FFA_RUN interface
  firmware: arm_ffa: Implement the notification bind and unbind interface
  firmware: arm_ffa: Implement notification bitmap create and destroy interfaces
  firmware: arm_ffa: Update the FF-A command list with v1.1 additions
  firmware: arm_ffa: Emit modalias for FF-A devices
  firmware: arm_ffa: Allow the FF-A drivers to use 32bit mode of messaging
  firmware: arm_ffa: Assign the missing IDR allocation ID to the FFA device

Link: https://lore.kernel.org/r/20231010124354.1620064-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'scmi-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers

Arm SCMI updates for v6.7

Main additions this time include:

1. SCMI v3.2 clock configuration support:
   This helps to retrieve the enabled state of a clock as well as allow
   to set OEM specific clock configurations.

2. Support for generic performance scaling(DVFS):
   The current SCMI DVFS support is limited to the CPUs in the kernel.
   This extension enables it to used for all kind of devices and not
   only for the CPUs. It updates the SCMI cpufreq to utilize the power
   domain bindings. It also adds a more generic SCMI performance domain
   based on the genpd framework that as be used for all the non-CPU
   devices.

3. Extend the generic performance scaling(DVFS) support for firmware
   driver OPPs:
   Consumer drivers for devices that are attached to the SCMI performance
   domain can't make use of the current OPP library to scale performance
   as the OPPs are firmware driven and often obtained from the firmware
   rather than the device tree. These changes extend the generic OPP
   and genpd PM domain frameworks to identify and utilise these firmware
   driven OPPs.

4. SCMI v3.2 clock parent support:
   This enables the support for discovering and changing parent clocks
   and extending the SCMI clk driver to use the same.

5. Qualcom SMC/HVC transport support:
   The Qualcomm virtual platforms require capability id in the hypervisor
   call to identify which doorbell to assert when supporting multiple
   SMC/HVC based SCMI transport channels. Extra parameter is added to
   support the same and the same is obtained at the fixed address in the
   shared memory which is initialised by the firmware.

6. Move the existing SCMI power domain driver under drivers/pmdomain

Apart from the above main changes, it also include couple of minor fixes
and cosmetic reworks.

* tag 'scmi-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (37 commits)
  firmware: arm_scmi: Add qcom smc/hvc transport support
  dt-bindings: arm: Add new compatible for smc/hvc transport for SCMI
  firmware: arm_scmi: Convert u32 to unsigned long to align with arm_smccc_1_1_invoke()
  clk: scmi: Add support for clock {set,get}_parent
  firmware: arm_scmi: Add support for clock parents
  clk: scmi: Free scmi_clk allocated when the clocks with invalid info are skipped
  firmware: arm_scpi: Use device_get_match_data()
  firmware: arm_scmi: Add generic OPP support to the SCMI performance domain
  firmware: arm_scmi: Specify the performance level when adding an OPP
  firmware: arm_scmi: Simplify error path in scmi_dvfs_device_opps_add()
  OPP: Extend support for the opp-level beyond required-opps
  OPP: Switch to use dev_pm_domain_set_performance_state()
  OPP: Extend dev_pm_opp_data with a level
  OPP: Add dev_pm_opp_add_dynamic() to allow more flexibility
  PM: domains: Implement the ->set_performance_state() callback for genpd
  PM: domains: Introduce dev_pm_domain_set_performance_state()
  firmware: arm_scmi: Rename scmi_{msg_,}clock_config_{get,set}_{2,21}
  firmware: arm_scmi: Do not use !! on boolean when setting msg->flags
  firmware: arm_scmi: Move power-domain driver to the pmdomain dir
  pmdomain: arm: Add the SCMI performance domain
  ...

Link: https://lore.kernel.org/r/20231010124347.1620040-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'vexpress-update-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers

Arm Vexpress updates for v6.7

Just a single update to use __counted_by annotation in config bus driver
in preparation to the upcoming versions of the toolchains(GCC and Clang)
with __counted_by attribute.

* tag 'vexpress-update-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
bus: vexpress-config: Annotate struct vexpress_syscfg_func with __counted_by

Link: https://lore.kernel.org/r/20231010124339.1620012-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'v6.6-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into soc/drivers

MediaTek drivers updates for v6.7

- Added support for Smart Voltage Scaling (SVS) on the MT8188 SoC

* tag 'v6.6-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux:
  soc: mediatek: svs: Add support for voltage bins
  soc: mediatek: svs: Add support for MT8188 SoC
  dt-bindings: soc: mediatek: add mt8188 svs dt-bindings

Link: https://lore.kernel.org/r/d25ccd90-277a-fd05-8605-f7d1d129d4fa@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'renesas-drivers-for-v6.7-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/drivers

Renesas driver updates for v6.7

  - Identify the new RZ/G3S SoC,
  - Miscellaneous fixes and improvements.

* tag 'renesas-drivers-for-v6.7-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
  soc: renesas: Kconfig: Remove blank line before ARCH_R9A07G043 help text
  soc: renesas: renesas-soc: Remove blank lines
  soc: renesas: Identify RZ/G3S SoC

Link: https://lore.kernel.org/r/cover.1695985423.git.geert+renesas@glider.be
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'platform-remove-void-soc-for-6.7-rc' of https://git.pengutronix.de/git/ukl/linux into soc/drivers

Convert drivers/soc to struct platform_driver::remove_new()

This PR contains the patches I sent in the series available at
https://lore.kernel.org/all/20230925095532.1984344-1-u.kleine-koenig@pengutronix.de
that were not yet picked up in next as of next-20231013.

It converts all drivers below drivers/soc to let their remove callback
return void. See commit 5c5a7680e67b ("platform: Provide a remove
callback that returns no value") for the rationale.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

soc: samsung: exynos-chipid: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230925095532.1984344-32-u.kleine-koenig@pengutronix.de
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231016072911.27148-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

soc/pxa: ssp: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-18-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/mediatek: mtk-mmsys: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230925095532.1984344-16-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/mediatek: mtk-devapc: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230925095532.1984344-15-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/loongson: loongson2_guts: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-14-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/litex: litex_soc_ctrl: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Acked-by: Gabriel Somlo <gsomlo@gmail.com>
Link: https://lore.kernel.org/r/20230925095532.1984344-13-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/ixp4xx: ixp4xx-qmgr: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-12-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/ixp4xx: ixp4xx-npe: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-11-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/hisilicon: kunpeng_hccs: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-10-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/fujitsu: a64fx-diag: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-9-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/fsl: cpm: tsa: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://lore.kernel.org/r/20230925095532.1984344-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/fsl: cpm: qmc: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://lore.kernel.org/r/20230925095532.1984344-7-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/fsl: dpaa2-console: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: https://lore.kernel.org/r/20230925095532.1984344-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

soc/tegra: pmc: Drop the ->opp_to_performance_state() callback

Since commit 7c41cdcd3bbe ("OPP: Simplify the over-designed pstate <->
level dance"), there is no longer any need for genpd providers to assign
the ->opp_to_performance_state(), hence let's drop it.

Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: linux-tegra@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>

soc/tegra: cbb: tegra194-cbb: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>

firmware: arm_scmi: Add qcom smc/hvc transport support

This change adds the support for SCMI message exchange on Qualcomm
virtual platforms.

The hypervisor associates an object-id also known as capability-id
with each smc/hvc doorbell object. The capability-id is used to
identify the doorbell from the VM's capability namespace, similar
to a file-descriptor.

The hypervisor, in addition to the function-id, expects the capability-id
to be passed in x1 register when SMC/HVC call is invoked.

The capability-id is allocated by the hypervisor on bootup and is stored in
the shmem region by the firmware before starting Linux.

Signed-off-by: Nikunj Kela <quic_nkela@quicinc.com>
Reviewed-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20231009191437.27926-3-quic_nkela@quicinc.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

dt-bindings: arm: Add new compatible for smc/hvc transport for SCMI

Introduce compatible "qcom,scmi-smc" for SCMI smc/hvc transport channel for
Qualcomm virtual platforms.

This compatible mandates populating an additional parameter 'capability-id'
from the last 8 bytes of the shmem channel.

Signed-off-by: Nikunj Kela <quic_nkela@quicinc.com>
Reviewed-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20231009191437.27926-2-quic_nkela@quicinc.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_scmi: Convert u32 to unsigned long to align with arm_smccc_1_1_invoke()

All the parameters to arm_smccc_1_1_invoke() are unsigned long which
aligns well on both 32-bit and 64-bit Arm based platforms. Let us store
all the members in the structure scmi_smc used as the parameters to the
arm_smccc_1_1_invoke() call as unsigned long.

Cc: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20231009152049.1428872-1-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Upgrade the driver version to v1.1

With quite a few v1.1 features supported, we can bump the driver version
to v1.1 now.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-17-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Update memory descriptor to support v1.1 format

Update memory transaction descriptor structure to accommodate couple of
new entries in v1.1 which were previously marked reserved and MBZ(must
be zero).

It also removes the flexible array member ep_mem_access in the memory
transaction descriptor structure as it need not be at fixed offset.
Also update ffa_mem_desc_offset() accessor to handle both old and new
formats of memory transaction descriptors.

The updated ffa_mem_region structure aligns with new format in v1.1 and
hence the driver/user must take care not to use members beyond and
including ep_mem_offset when using the old format.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-16-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Switch to using ffa_mem_desc_offset() accessor

In preparation to add support to the new memory transaction descriptor,
the ep_mem_access member needs to be removed and hence even the macro
COMPOSITE_OFFSET(). Let us switch to using the new ffa_mem_desc_offset()
accessor in ffa_setup_and_transmit().

This will enable adding the support for new format transparently without
any changes here again.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-15-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

KVM: arm64: FFA: Remove access of endpoint memory access descriptor array

FF-A v1.1 removes the fixed location of endpoint memory access descriptor
array within the memory transaction descriptor structure. In preparation
to remove the ep_mem_access member from the ffa_mem_region structure,
provide the accessor to fetch the offset and use the same in FF-A proxy
implementation.

The accessor take the FF-A version as the argument from which the memory
access descriptor format can be determined. v1.0 uses the old format while
v1.1 onwards use the new format specified in the v1.1 specification.

Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-14-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

clk: scmi: Add support for clock {set,get}_parent

SCMI v3.2 adds set/get parent clock commands, so update the SCMI clock
driver to support them.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20231004-scmi-clock-v3-v5-2-1b8a1435673e@nxp.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_scmi: Add support for clock parents

SCMI v3.2 spec introduces CLOCK_POSSIBLE_PARENTS_GET, CLOCK_PARENT_SET
and CLOCK_PARENT_GET. Add support for these to enable clock parents
and use them in the clock driver.

Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20231004-scmi-clock-v3-v5-1-1b8a1435673e@nxp.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

clk: scmi: Free scmi_clk allocated when the clocks with invalid info are skipped

Add the missing devm_kfree() when we skip the clocks with invalid or
missing information from the firmware.

Cc: Cristian Marussi <cristian.marussi@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: linux-clk@vger.kernel.org
Fixes: 6d6a1d82eaef ("clk: add support for clocks provided by SCMI")
Link: https://lore.kernel.org/r/20231004193600.66232-1-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_scpi: Use device_get_match_data()

Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.

Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20231006224650.445424-1-robh@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Simplify the computation of transmit and fragment length

The computation of endpoint memory access descriptor's composite memory
region descriptor offset is using COMPOSITE_CONSTITUENTS_OFFSET which is
unnecessary complicated. Composite memory region descriptor always follow
the endpoint memory access descriptor array and hence it is computed
accordingly. COMPOSITE_CONSTITUENTS_OFFSET is useless and wrong for any
input other than endpoint memory access descriptor count.

Let us drop the usage of COMPOSITE_CONSTITUENTS_OFFSET to simplify the
computation of total transmit and fragment length in the memory
transactions.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-13-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Add notification handling mechanism

With all the necessary plumbing in place, let us add handling the
notifications as part of schedule receiver interrupt handler. In order
to do so, we need to just register scheduling callback on behalf of the
driver partition.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-12-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Add interface to send a notification to a given partition

The framework provides an interface to the sender endpoint to specify
the notification to signal to the receiver endpoint. A sender signals
a notification by requesting its partition manager to set the
corresponding bit in the notifications bitmap of the receiver.

Expose the ability to send a notification to another partition.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-11-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Add interfaces to request notification callbacks

Add interface to the FFA driver to allow for client drivers to request
and relinquish a notification as well as provide a callback for the
notification.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-10-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Add schedule receiver callback mechanism

Enable client drivers to register a callback function that will be
called when one or more notifications are pending for a target
partition as part of schedule receiver interrupt handling.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-9-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Initial support for scheduler receiver interrupt

The Framework uses the schedule receiver interrupt to inform the
receiver’s scheduler that the receiver must be run to handle a pending
notification. A receiver’s scheduler can obtain the description of the
schedule receiver interrupt by invoking the FFA_FEATURES interface.

The delivery of the physical schedule receiver interrupt from the secure
state to the non-secure state depends upon the state of the interrupt
controller as configured by the hypervisor.

The schedule seceiver interrupt is assumed to be a SGI. The Arm GIC
specification defines 16 SGIs. It recommends that they are equally
divided between the non-secure and secure states. OS like Linux kernel
in the non-secure state typically do not have SGIs to spare. The usage
of SGIs in the secure state is however limited. It is more likely that
software in the Secure world does not use all the SGIs allocated to it.

It is recommended that the secure world software donates an unused SGI
to the normal world for use as the schedule receiver interrupt. This
implies that secure world software must configure the SGI in the GIC
as a non-secure interrupt before presenting it to the normal world.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-8-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface

The receiver’s scheduler uses the FFA_NOTIFICATION_INFO_GET interface
to retrieve the list of endpoints that have pending notifications and
must be run. A notification could be signaled by a sender in the secure
world to a VM. The Hypervisor needs to determine which VM and vCPU
(in case a per-vCPU notification is signaled) has a pending notification
in this scenario. It must obtain this information through an invocation
of the FFA_NOTIFICATION_INFO_GET.

Add the implementation of the NOTIFICATION_INFO_GET interface
and prepare to use this to handle the schedule receiver interrupt.
Implementation of handling notifications will be added later.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-7-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Implement the FFA_NOTIFICATION_GET interface

The framework provides an interface to the receiver to determine the
identity of the notification. A receiver endpoint must use the
FFA_NOTIFICATION_GET interface to retrieve its pending notifications
and handle them.

Add the support for FFA_NOTIFICATION_GET to allow the caller(receiver)
to fetch its pending notifications from other partitions in the system.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-6-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Implement the FFA_NOTIFICATION_SET interface

The framework provides an interface to the sender to specify the
notification to signal to the receiver. A sender signals a notification
by requesting its partition manager to set the corresponding bit in the
notifications bitmap of the receiver invoking FFA_NOTIFICATION_SET.

Implement the FFA_NOTIFICATION_SET to enable the caller(sender) to send
the notifications for any other partitions in the system.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-5-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Implement the FFA_RUN interface

FFA_RUN is used by a scheduler to allocate CPU cycles to a target
endpoint execution context specified in the target information parameter.

If the endpoint execution context is in the waiting/blocked state, it
transitions to the running state.

Expose the ability to call FFA_RUN in order to give any partition in the
system cpu cycles to perform IMPDEF functionality.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-4-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Implement the notification bind and unbind interface

A receiver endpoint must bind a notification to any sender endpoint
before the latter can signal the notification to the former. The receiver
assigns one or more doorbells to a specific sender. Only the sender can
ring these doorbells.

A receiver uses the FFA_NOTIFICATION_BIND interface to bind one or more
notifications to the sender. A receiver un-binds a notification from a
sender endpoint to stop the notification from being signaled. It uses
the FFA_NOTIFICATION_UNBIND interface to do this.

Allow the FF-A driver to be able to bind and unbind a given notification
ID to a specific partition ID. This will be used to register and
unregister notification callbacks from the FF-A client drivers.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-3-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Implement notification bitmap create and destroy interfaces

On systems without a hypervisor the responsibility of requesting the
creation of the notification bitmaps in the SPM falls to the FF-A driver.

We use FFA features to determine if the ABI is supported, if it is not
we can assume there is a hypervisor present and will take care of ensure
the relevant notifications bitmaps are created on this partitions behalf.

An endpoint’s notification bitmaps needs to be setup before it configures
its notifications and before other endpoints and partition managers can
start signaling these notifications.

Add interface to create and destroy the notification bitmaps and use the
same to do the necessary setup during the initialisation and cleanup
during the module exit.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-2-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Update the FF-A command list with v1.1 additions

Arm Firmware Framework for A-profile(FFA) v1.1 introduces notifications
and indirect messaging based upon notifications support and extends some
of the memory interfaces.

Let us add all the newly supported FF-A function IDs in the spec.
Also update to the error values and associated handling.

Link: https://lore.kernel.org/r/20231005-ffa_v1-1_notif-v4-1-cddd3237809c@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Emit modalias for FF-A devices

In order to enable libkmod lookups for FF-A device objects to their
corresponding module, add 'modalias' to the base attribute of FF-A
devices.

Tested-by: Abdellatif El Khlifi <abdellatif.elkhlifi@arm.com>
Link: https://lore.kernel.org/r/20231005175640.379631-1-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Allow the FF-A drivers to use 32bit mode of messaging

An FF-A ABI could support both the SMC32 and SMC64 conventions.
A callee that runs in the AArch64 execution state and implements such
an ABI must implement both SMC32 and SMC64 conventions of the ABI.

So the FF-A drivers will need the option to choose the mode irrespective
of FF-A version and the partition execution mode flag in the partition
information.

Let us remove the check on the FF-A version for allowing the selection
of 32bit mode of messaging. The driver will continue to set the 32-bit
mode if the partition execution mode flag specified that the partition
supports only 32-bit execution.

Fixes: 106b11b1ccd5 ("firmware: arm_ffa: Set up 32bit execution mode flag using partiion property")
Link: https://lore.kernel.org/r/20231005142823.278121-1-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_ffa: Assign the missing IDR allocation ID to the FFA device

Commit 19b8766459c4 ("firmware: arm_ffa: Fix FFA device names for logical
partitions") added an ID to the FFA device using ida_alloc() and append
the same to "arm-ffa" to make up a unique device name. However it missed
to stash the id value in ffa_dev to help freeing the ID later when the
device is destroyed.

Due to the missing/unassigned ID in FFA device, we get the following
warning when the FF-A device is unregistered.

  |   ida_free called for id=0 which is not allocated.
  |   WARNING: CPU: 7 PID: 1 at lib/idr.c:525 ida_free+0x114/0x164
  |   CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc4 #209
  |   pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
  |   pc : ida_free+0x114/0x164
  |   lr : ida_free+0x114/0x164
  |   Call trace:
  |    ida_free+0x114/0x164
  |    ffa_release_device+0x24/0x3c
  |    device_release+0x34/0x8c
  |    kobject_put+0x94/0xf8
  |    put_device+0x18/0x24
  |    klist_devices_put+0x14/0x20
  |    klist_next+0xc8/0x114
  |    bus_for_each_dev+0xd8/0x144
  |    arm_ffa_bus_exit+0x30/0x54
  |    ffa_init+0x68/0x330
  |    do_one_initcall+0xdc/0x250
  |    do_initcall_level+0x8c/0xac
  |    do_initcalls+0x54/0x94
  |    do_basic_setup+0x1c/0x28
  |    kernel_init_freeable+0x104/0x170
  |    kernel_init+0x20/0x1a0
  |    ret_from_fork+0x10/0x20

Fix the same by actually assigning the ID in the FFA device this time
for real.

Fixes: 19b8766459c4 ("firmware: arm_ffa: Fix FFA device names for logical partitions")
Link: https://lore.kernel.org/r/20231003085932.3553985-1-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_scmi: Add generic OPP support to the SCMI performance domain

To allow a consumer driver to use the OPP library to scale the performance
for its device, let's dynamically add the OPP table when the device gets
attached to its SCMI performance domain.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20230925131715.138411-10-ulf.hansson@linaro.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_scmi: Specify the performance level when adding an OPP

To enable the performance level to be used for OPPs, let's convert into
using the dev_pm_opp_add_dynamic() API when creating them. This will be
particularly useful for the SCMI performance domain, as shown through
subsequent changes.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20230925131715.138411-9-ulf.hansson@linaro.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

firmware: arm_scmi: Simplify error path in scmi_dvfs_device_opps_add()

Let's simplify the code in scmi_dvfs_device_opps_add() by using
dev_pm_opp_remove_all_dynamic() in the error path.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20230925131715.138411-8-ulf.hansson@linaro.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

Merge branch 'opp/pm-domain-scmi' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into for-next/scmi/updates

This is the merge of immutable point in PM OPP tree shared with SCMI so
that the SCMI changes based on these OPP changes can be merged via the
SCMI tree.

* 'opp/pm-domain-scmi' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  OPP: Extend support for the opp-level beyond required-opps
  OPP: Switch to use dev_pm_domain_set_performance_state()
  OPP: Extend dev_pm_opp_data with a level
  OPP: Add dev_pm_opp_add_dynamic() to allow more flexibility
  PM: domains: Implement the ->set_performance_state() callback for genpd
  PM: domains: Introduce dev_pm_domain_set_performance_state()

OPP: Extend support for the opp-level beyond required-opps

At this point the level (performance state) for an OPP is currently limited
to be requested for a device that is attached to a PM domain. Moreover,
the device needs to have the so called required-opps assigned to it, which
are based upon OPP tables being described in DT.

To extend the support beyond required-opps and DT, let's enable the level
to be set for all OPPs. More precisely, if the requested OPP has a valid
level let's try to request it through the device's optional PM domain, via
calling dev_pm_domain_set_performance_state().

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[ Viresh: Handle NULL opp in _set_opp_level() ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

OPP: Switch to use dev_pm_domain_set_performance_state()

To support performance scaling for any kinds of PM domains, let's move away
from using the genpd specific API, dev_pm_genpd_set_performance_state(), to
the common dev_pm_domain_set_performance_state().

No intended functional impact at this point.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

OPP: Extend dev_pm_opp_data with a level

Let's extend the dev_pm_opp_data with a level variable, to allow users to
specify a corresponding level (performance state) for a dynamically added
OPP.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

OPP: Add dev_pm_opp_add_dynamic() to allow more flexibility

The dev_pm_opp_add() API is limited to add dynamic OPPs with a frequency
and a voltage level. To enable more flexibility, let's add a new API,
dev_pm_opp_add_dynamic() that's takes a struct dev_pm_opp_data* instead of
a list of in-parameters.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

PM: domains: Implement the ->set_performance_state() callback for genpd

To enable generic support for performance scaling for PM domains, let's
implement the ->set_performance_state() callback for genpd.

Beyond this change, users of the corresponding genpd specific API,
dev_pm_genpd_set_performance_state() are encouraged to switch to the common
dev_pm_domain_set_performance_state() API.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

PM: domains: Introduce dev_pm_domain_set_performance_state()

The generic PM domain is currently the only PM domain variant that supports
performance scaling. To allow performance scaling to be supported through a
common interface, let's add an optional callback ->set_performance_state(),
in the struct dev_pm_domain.

Moreover, let's add a function, dev_pm_domain_set_performance_state(), that
may be called by consumers to request a new performance state for a device
through its PM domain.

Note that, in most cases it's preferred that a consumer use the OPP library
to request a new performance state for its device. Although, this requires
some additional changes to be supported, which are being implemented from
subsequent changes.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

soc: mediatek: svs: Add support for voltage bins

Add support voltage bins turn point

Signed-off-by: Mark Tseng <chun-jen.tseng@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230921052637.30444-4-chun-jen.tseng@mediatek.com

soc: mediatek: svs: Add support for MT8188 SoC

MT8188 svs gpu uses 2-line high bank and low bank to optimize the
voltage of opp table for higher and lower frequency respectively.

Signed-off-by: Mark Tseng <chun-jen.tseng@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230921052637.30444-3-chun-jen.tseng@mediatek.com

dt-bindings: soc: mediatek: add mt8188 svs dt-bindings

Add mt8188 svs compatible in dt-bindings.

Signed-off-by: Mark Tseng <chun-jen.tseng@mediatek.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230921052637.30444-2-chun-jen.tseng@mediatek.com

Linux 6.6-rc4

Merge tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix the module compression with xz so the in-kernel decompressor
   works

- Document a kconfig idiom to express an optional dependency between
   modules

- Make modpost, when W=1 is given, detect broken drivers that reference
   .exit.* sections

- Remove unused code

* tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: remove stale code for 'source' symlink in packaging scripts
  modpost: Don't let "driver"s reference .exit.*
  vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
  modpost: add missing else to the "of" check
  Documentation: kbuild: explain handling optional dependencies
  kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules

Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
  pertain to issues which were introduced after 6.5"

* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  Crash: add lock to serialize crash hotplug handling
  selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
  mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
  mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
  mm, memcg: reconsider kmem.limit_in_bytes deprecation
  mm: zswap: fix potential memory corruption on duplicate store
  arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
  mm: hugetlb: add huge page size param to set_huge_pte_at()
  maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
  maple_tree: add mas_is_active() to detect in-tree walks
  nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
  mm: abstract moving to the next PFN
  mm: report success more often from filemap_map_folio_range()
  fs: binfmt_elf_efpic: fix personality for ELF-FDPIC

Merge tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull misc driver fix from Greg KH:
"Here is a single, much requested, fix for a set of misc drivers to
  resolve a much reported regression in the -rc series that has also
  propagated back to the stable releases. Sorry for the delay, lots of
  conference travel for a few weeks put me very far behind in patch
  wrangling.

  It has been reported by many to resolve the reported problem, and has
  been in linux-next with no reported issues"

* tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe

Merge tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial driver fixes from Greg KH:
"Here are two tty/serial driver fixes for 6.6-rc4 that resolve some
  reported regressions:

   - revert a n_gsm change that ended up causing problems

   - 8250_port fix for irq data

  both have been in linux-next for over a week with no reported
  problems"

* tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
  serial: 8250_port: Check IRQ data before use

Merge tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes: a kerneldoc build warning fix, add SRSO mitigation for
  AMD-derived Hygon processors, and fix a SGX kernel crash in the page
  fault handler that can trigger when ksgxd races to reclaim the SECS
  special page, by making the SECS page unswappable"

* tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race
  x86/srso: Add SRSO mitigation for Hygon processors
  x86/kgdb: Fix a kerneldoc warning when build with W=1

Merge tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"Fix a spurious kernel warning during CPU hotplug events that may
  trigger when timer/hrtimer softirqs are pending, which are otherwise
  hotplug-safe and don't merit a warning"

* tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Tag (hr)timer softirq as hotplug safe

Merge tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
"Fix a RT tasks related lockup/live-lock during CPU offlining"

* tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix live lock between select_fallback_rq() and RT push

Merge tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fixes from Ingo Molnar:
"Misc fixes: work around an AMD microcode bug on certain models, and
  fix kexec kernel PMI handlers on AMD systems that get loaded on older
  kernels that have an unexpected register state"

* tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd: Do not WARN() on every IRQ
  perf/x86/amd/core: Fix overflow reset on hotplug

kbuild: remove stale code for 'source' symlink in packaging scripts

Since commit d8131c2965d5 ("kbuild: remove $(MODLIB)/source symlink"),
modules_install does not create the 'source' symlink.

Remove the stale code from builddeb and kernel.spec.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

modpost: Don't let "driver"s reference .exit.*

Drivers must not reference functions marked with __exit as these likely
are not available when the code is built-in.

There are few creative offenders uncovered for example in ARCH=amd64
allmodconfig builds. So only trigger the section mismatch warning for
W=1 builds.

The dual rule that drivers must not reference .init.* is implemented
since commit 0db252452378 ("modpost: don't allow *driver to reference
.init.*") which however missed that .exit.* should be handled in the
same way.

Thanks to Masahiro Yamada and Arnd Bergmann who gave valuable hints to
find this improvement.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros

Remove the left-over of commit e24f6628811e ("modpost: remove all
traces of cpuinit/cpuexit sections").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>

modpost: add missing else to the "of" check

Without this 'else' statement, an "usb" name goes into two handlers:
the first/previous 'if' statement _AND_ the for-loop over 'devtable',
but the latter is useless as it has no 'usb' device_id entry anyway.

Tested with allmodconfig before/after patch; no changes to *.mod.c:

    git checkout v6.6-rc3
    make -j$(nproc) allmodconfig
    make -j$(nproc) olddefconfig

    make -j$(nproc)
    find . -name '*.mod.c' | cpio -pd /tmp/before

    # apply patch

    make -j$(nproc)
    find . -name '*.mod.c' | cpio -pd /tmp/after

    diff -r /tmp/before/ /tmp/after/
    # no difference

Fixes: acbef7b76629 ("modpost: fix module autoloading for OF devices with generic compatible property")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

Merge tag 'soc-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"These are the latest bug fixes that have come up in the soc tree. Most
  of these are fairly minor. Most notably, the majority of changes this
  time are not for dts files as usual.

   - Updates to the addresses of the broadcom and aspeed entries in the
     MAINTAINERS file.

   - Defconfig updates to address a regression on samsung and a build
     warning from an unknown Kconfig symbol

   - Build fixes for the StrongARM and Uniphier platforms

   - Code fixes for SCMI and FF-A firmware drivers, both of which had a
     simple bug that resulted in invalid data, and a lesser fix for the
     optee firmware driver

   - Multiple fixes for the recently added loongson/loongarch "guts" soc
     driver

   - Devicetree fixes for RISC-V on the startfive platform, addressing
     issues with NOR flash, usb and uart.

   - Multiple fixes for NXP i.MX8/i.MX9 dts files, fixing problems with
     clock, gpio, hdmi settings and the Makefile

   - Bug fixes for i.MX firmware code and the OCOTP soc driver

   - Multiple fixes for the TI sysc bus driver

   - Minor dts updates for TI omap dts files, to address boot time
     warnings and errors"

* tag 'soc-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
  MAINTAINERS: Fix Florian Fainelli's email address
  arm64: defconfig: enable syscon-poweroff driver
  ARM: locomo: fix locomolcd_power declaration
  soc: loongson: loongson2_guts: Remove unneeded semicolon
  soc: loongson: loongson2_guts: Convert to devm_platform_ioremap_resource()
  soc: loongson: loongson_pm2: Populate children syscon nodes
  dt-bindings: soc: loongson,ls2k-pmc: Allow syscon-reboot/syscon-poweroff as child
  soc: loongson: loongson_pm2: Drop useless of_device_id compatible
  dt-bindings: soc: loongson,ls2k-pmc: Use fallbacks for ls2k-pmc compatible
  soc: loongson: loongson_pm2: Add dependency for INPUT
  arm64: defconfig: remove CONFIG_COMMON_CLK_NPCM8XX=y
  ARM: uniphier: fix cache kernel-doc warnings
  MAINTAINERS: aspeed: Update Andrew's email address
  MAINTAINERS: aspeed: Update git tree URL
  firmware: arm_ffa: Don't set the memory region attributes for MEM_LEND
  arm64: dts: imx: Add imx8mm-prt8mm.dtb to build
  arm64: dts: imx8mm-evk: Fix hdmi@3d node
  soc: imx8m: Enable OCOTP clock for imx8mm before reading registers
  arm64: dts: imx8mp-beacon-kit: Fix audio_pll2 clock
  arm64: dts: imx8mp: Fix SDMA2/3 clocks
  ...

Merge tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Make sure 32-bit applications using user events have aligned access
   when running on a 64-bit kernel.

- Add cond_resched in the loop that handles converting enums in
   print_fmt string is trace events.

- Fix premature wake ups of polling processes in the tracing ring
   buffer. When a task polls waiting for a percentage of the ring buffer
   to be filled, the writer still will wake it up at every event. Add
   the polling's percentage to the "shortest_full" list to tell the
   writer when to wake it up.

- For eventfs dir lookups on dynamic events, an event system's only
   event could be removed, leaving its dentry with no children. This is
   totally legitimate. But in eventfs_release() it must not access the
   children array, as it is only allocated when the dentry has children.

* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Test for dentries array allocated in eventfs_release()
  tracing/user_events: Align set_bit() address for all archs
  tracing: relax trace_event_eval_update() execution with cond_resched()
  ring-buffer: Update "shortest_full" in polling

eventfs: Test for dentries array allocated in eventfs_release()

The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.

Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ef36b4f92868 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/user_events: Align set_bit() address for all archs

All architectures should use a long aligned address passed to set_bit().
User processes can pass either a 32-bit or 64-bit sized value to be
updated when tracing is enabled when on a 64-bit kernel. Both cases are
ensured to be naturally aligned, however, that is not enough. The
address must be long aligned without affecting checks on the value
within the user process which require different adjustments for the bit
for little and big endian CPUs.

Add a compat flag to user_event_enabler that indicates when a 32-bit
value is being used on a 64-bit kernel. Long align addresses and correct
the bit to be used by set_bit() to account for this alignment. Ensure
compat flags are copied during forks and used during deletion clears.

Link: https://lore.kernel.org/linux-trace-kernel/20230925230829.341-2-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/20230914131102.179100-1-cleger@rivosinc.com/
Cc: stable@vger.kernel.org
Fixes: 7235759084a4 ("tracing/user_events: Use remote writes for event enablement")
Reported-by: Clément Léger <cleger@rivosinc.com>
Suggested-by: Clément Léger <cleger@rivosinc.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: relax trace_event_eval_update() execution with cond_resched()

When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.

Adding a call to cond_resched() in trace_event_eval_update() allows
other tasks to be executed and thus continue working asynchronously
like before without blocking any pending task at boot time.

Link: https://lore.kernel.org/linux-trace-kernel/20230929191637.416931-1-cleger@rivosinc.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ring-buffer: Update "shortest_full" in polling

It was discovered that the ring buffer polling was incorrectly stating
that read would not block, but that's because polling did not take into
account that reads will block if the "buffer-percent" was set. Instead,
the ring buffer polling would say reads would not block if there was any
data in the ring buffer. This was incorrect behavior from a user space
point of view. This was fixed by commit 42fb0a1e84ff by having the polling
code check if the ring buffer had more data than what the user specified
"buffer percent" had.

The problem now is that the polling code did not register itself to the
writer that it wanted to wait for a specific "full" value of the ring
buffer. The result was that the writer would wake the polling waiter
whenever there was a new event. The polling waiter would then wake up, see
that there's not enough data in the ring buffer to notify user space and
then go back to sleep. The next event would wake it up again.

Before the polling fix was added, the code would wake up around 100 times
for a hackbench 30 benchmark. After the "fix", due to the constant waking
of the writer, it would wake up over 11,0000 times! It would never leave
the kernel, so the user space behavior was still "correct", but this
definitely is not the desired effect.

To fix this, have the polling code add what it's waiting for to the
"shortest_full" variable, to tell the writer not to wake it up if the
buffer is not as full as it expects to be.

Note, after this fix, it appears that the waiter is now woken up around 2x
the times it was before (~200). This is a tremendous improvement from the
11,000 times, but I will need to spend some time to see why polling is
more aggressive in its wakeups than the read blocking code.

Link: https://lore.kernel.org/linux-trace-kernel/20230929180113.01c2cae3@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

- fix the narea calculation in swiotlb initialization (Ross Lagerwall)

- fix the check whether a device has used swiotlb (Petr Tesarik)

* tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix the check whether a device has used software IO TLB
swiotlb: use the calculated number of areas

Merge tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:

- Handle a race between writing and shrinking block devices by
   returning EIO

- Fix a typo in a comment

* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Spelling s/preceeding/preceding/g
  iomap: add a workaround for racy i_size updates on block devices

Merge tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Usual business: a driver fix, a DT fix, a minor core fix"

* tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: npcm7xx: Fix callback completion ordering
  i2c: mux: Avoid potential false error message in i2c_mux_add_adapter
  dt-bindings: i2c: mxs: Pass ref and 'unevaluatedProperties: false'

Merge tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Fix a possible NULL pointer dereference in the error path of
acpi_video_bus_add() resulting from recent changes (Dinghao Liu)"

* tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Fix NULL pointer dereference in acpi_video_bus_add()

Merge tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix arch_stack_walk_reliable(), used by live patching

- Fix powerpc selftests to work with run_kselftest.sh

Thanks to Joe Lawrence and Petr Mladek.

* tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix emit_tests to work with run_kselftest.sh
powerpc/stacktrace: Fix arch_stack_walk_reliable()

Merge tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

- Fix NFSv4 READ corner case

* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set

Merge tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
"Fix for password freeing potential oops (also for stable)"

* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb/client: Reset password pointer to NULL

Crash: add lock to serialize crash hotplug handling

Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period.  They failed because failing to take __kexec_lock.

=======
[   78.714569] Fallback order for Node 0: 0
[   78.714575] Built 1 zonelists, mobility grouping on.  Total pages: 1817886
[   78.717133] Policy zone: Normal
[   78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   80.056643] PEFILE: Unsigned PE binary
=======

The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively.  So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.

Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically.  This doesn't impact the usage of __kexec_lock.

Link: https://lkml.kernel.org/r/20230926120905.392903-1-bhe@redhat.com
Fixes: 247262756121 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error

According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).

The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).

Error Example:
awk: not an option: -e

Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified

When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page.  Then commit 6f4576e3687b ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone.  The
return value problem was fixed by commit a7f40cfe3b7a ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.

The code should conceptually do:

if (MPOL_MF_MOVE|MOVEALL)
     scan all vmas
     try to migrate the existing pages
     return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
     scan all vmas
     try to migrate the existing pages
     return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
     break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
     check the ranges in test_walk, EFAULT without mbind_range() if discontig.

Fixed the behavior.

Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.com
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()

When CONFIG_DAMON_VADDR_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.

Since commit 9f86d624292c ("mm/damon/vaddr-test: remove unnecessary
variables"), the damon_destroy_ctx() is removed, but still call
damon_new_target() and damon_new_region(), the damon_region which is
allocated by kmem_cache_alloc() in damon_new_region() and the damon_target
which is allocated by kmalloc in damon_new_target() are not freed.  And
the damon_region which is allocated in damon_new_region() in
damon_set_regions() is also not freed.

So use damon_destroy_target to free all the damon_regions and damon_target.

    unreferenced object 0xffff888107c9a940 (size 64):
      comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        60 c7 9c 07 81 88 ff ff f8 cb 9c 07 81 88 ff ff  `...............
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff8881079cc740 (size 56):
      comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888107c9ac40 (size 64):
      comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        a0 cc 9c 07 81 88 ff ff 78 a1 76 07 81 88 ff ff  ........x.v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff8881079ccc80 (size 56):
      comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888107c9af40 (size 64):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        20 a2 76 07 81 88 ff ff b8 a6 76 07 81 88 ff ff   .v.......v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776a200 (size 56):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776a740 (size 56):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.025s)
      hex dump (first 32 bytes):
        3d 00 00 00 00 00 00 00 3f 00 00 00 00 00 00 00  =.......?.......
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
        [<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888108038240 (size 64):
      comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 03 00 00 00 6b 6b 6b 6b  ............kkkk
        48 ad 76 07 81 88 ff ff 98 ae 76 07 81 88 ff ff  H.v.......v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776ad28 (size 56):
      comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
        [<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
        [<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20

Link: https://lkml.kernel.org/r/20230925072100.3725620-1-ruanjinjie@huawei.com
Fixes: 9f86d624292c ("mm/damon/vaddr-test: remove unnecessary variables")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm, memcg: reconsider kmem.limit_in_bytes deprecation

This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit.  Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1].  The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result.  The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.

Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file.  Therefore we have to go and revisit 58056f77502f as
well.  There are two ways to go ahead.  Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.

Note to backporters to stable trees.  a8c49af3be5f ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore.  Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).

[akpm@linux-foundation.org: fix build - remove unused local]
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: zswap: fix potential memory corruption on duplicate store

While stress-testing zswap a memory corruption was happening when writing
back pages. __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree. This change removes duplicate
entries in zswap_store before the actual attempt.

[cerasuolodomenico@gmail.com: add a warning and a comment, per Johannes]
Link: https://lkml.kernel.org/r/20230925130002.1929369-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230922172211.1704917-1-cerasuolodomenico@gmail.com
Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries

When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above.  While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs.  And since then new
callers have come along which rely on this working.  But given the
brokeness is only observable after commit 8a13897fb0da ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.

Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.

Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: hugetlb: add huge page size param to set_huge_pte_at()

Patch series "Fix set_huge_pte_at() panic on arm64", v2.

This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic.  The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory.  This test (and the uffd poison feature) was merged for
v6.5-rc7.

Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.

Description of Bug
==================

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():

    static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
    {
        VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));

        return page_folio(pfn_to_page(swp_offset_pfn(entry)));
    }

Fix
===

The simplest fix would have been to revert the dodgy cleanup commit
18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at().  As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.

So instead, I've added a huge page size parameter to set_huge_pte_at().
This means that the arm64 code has the size in all cases.  It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64).  I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.

This patch (of 2):

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at().  Provide for this by
adding an `unsigned long sz` parameter to the function.  This follows the
same pattern as huge_pte_clear().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
commit.

No behavioral changes intended.

Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states

When updating the maple tree iterator to avoid rewalks, an issue was
introduced when shifting beyond the limits.  This can be seen by trying to
go to the previous address of 0, which would set the maple node to
MAS_NONE and keep the range as the last entry.

Subsequent calls to mas_find() would then search upwards from mas->last
and skip the value at mas->index/mas->last.  This showed up as a bug in
mprotect which skips the actual VMA at the current range after attempting
to go to the previous VMA from 0.

Since MAS_NONE may already be set when searching for a value that isn't
contained within a node, changing the handling of MAS_NONE in mas_find()
would make the code more complicated and error prone.  Furthermore, there
was no way to tell which limit was hit, and thus which action to take
(next or the entry at the current range).

This solution is to add two states to track what happened with the
previous iterator action.  This allows for the expected behaviour of the
next command to return the correct item (either the item at the range
requested, or the next/previous).

Tests are also added and updated accordingly.

Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com
Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/
Fixes: 39193685d585 ("maple_tree: try harder to keep active node with mas_prev()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Closes: https://bugs.archlinux.org/task/79656
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

maple_tree: add mas_is_active() to detect in-tree walks

Patch series "maple_tree: Fix mas_prev() state regression".

Pedro Falcato retported an mprotect regression [1] which was bisected back
to the iterator changes for maple tree. Root cause analysis showed the
mas_prev() running off the end of the VMA space (previous from 0) followed
by mas_find(), would skip the first value.

This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.

Users who encounter this bug may see mprotect(), userfaultfd_register(),
and mlock() fail on VMAs mapped with address 0.

This patch (of 2):

Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.

Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()

In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails.  If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed.  However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug.  This patch moves the release
operation after unlocking and putting the page.

NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed.  However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.

[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e89 ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: abstract moving to the next PFN

In order to fix the L1TF vulnerability, x86 can invert the PTE bits for
PROT_NONE VMAs, which means we cannot move from one PTE to the next by
adding 1 to the PFN field of the PTE. This results in the BUG reported at
[1].

Abstract advancing the PTE to the next PFN through a pte_next_pfn()
function/macro.

Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com [1]
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>