linux.git
2 years agodrm/amdgpu: don't enable secure display on incompatible platforms
Jesse Zhang [Thu, 18 May 2023 01:46:22 +0000 (09:46 +0800)]
drm/amdgpu: don't enable secure display on incompatible platforms

[why]
[drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
[drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
amdgpu 0000:04:00.0: amdgpu: Secure display: Generic Failure.

[how]
don't enable secure display on incompatible platforms

Suggested-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Jesse zhang <jesse.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/radeon: Remove unnecessary (void*) conversions
Su Hui [Wed, 17 May 2023 02:52:19 +0000 (10:52 +0800)]
drm/radeon: Remove unnecessary (void*) conversions

No need cast (void*) to (struct radeon_device *)
or (struct radeon_ring *).

Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm:amd:amdgpu: Fix missing buffer object unlock in failure path
Sukrut Bellary [Wed, 3 May 2023 23:15:07 +0000 (16:15 -0700)]
drm:amd:amdgpu: Fix missing buffer object unlock in failure path

smatch warning -
1) drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:3615 gfx_v9_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.

2) drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:6901 gfx_v10_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.

Signed-off-by: Sukrut Bellary <sukrut.bellary@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: Fix artifacting on eDP panels when engaging freesync video mode
Aurabindo Pillai [Wed, 17 May 2023 18:39:46 +0000 (14:39 -0400)]
drm/amd/display: Fix artifacting on eDP panels when engaging freesync video mode

[Why]
When freesync video mode is enabled, switching resolution from native
mode to one of the freesync video compatible modes can trigger continous
artifacts on some eDP panels when running under KDE. The articating can be seen in the
attached bug report.

[How]
Fix this by restricting updates that require full commit by using the same checks
for stream and scaling changes in the the enable pass of dm_update_crtc_state()
along with the check for compatible timings for freesync vide mode.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162
Fixes: da5e14909776 ("drm/amd/display: Fix hang when skipping modeset")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Validate VM ioctl flags.
Bas Nieuwenhuizen [Sat, 13 May 2023 12:51:00 +0000 (14:51 +0200)]
drm/amdgpu: Validate VM ioctl flags.

None have been defined yet, so reject anybody setting any. Mesa sets
it to 0 anyway.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 years agodrm/amd: Update driver-misc.html for Rembrandt-R
Mario Limonciello [Thu, 18 May 2023 14:40:09 +0000 (09:40 -0500)]
drm/amd: Update driver-misc.html for Rembrandt-R

AMD has added marketing information publicly for Rembrandt-R, so
update the APU table with matching versions.

Link: https://www.amd.com/en/product/13086
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: remove unnecessary (void*) conversions
Su Hui [Mon, 15 May 2023 01:34:28 +0000 (09:34 +0800)]
drm/amdgpu: remove unnecessary (void*) conversions

No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd: Update driver-misc.html for Dragon Range
Mario Limonciello [Thu, 18 May 2023 14:35:17 +0000 (09:35 -0500)]
drm/amd: Update driver-misc.html for Dragon Range

AMD has added marketing information publicly for Dragon Range, so
update the APU table with matching versions.

Link: https://www.amd.com/en/product/13016
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd: Update driver-misc.html for Phoenix
Mario Limonciello [Thu, 18 May 2023 14:30:26 +0000 (09:30 -0500)]
drm/amd: Update driver-misc.html for Phoenix

AMD has added marketing information publicly for Phoenix, so
update the APU table with matching versions.

Link: https://www.amd.com/en/product/13036
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: fix incorrect pcie_gen_mask in passthrough case
Tong Liu01 [Tue, 16 May 2023 06:50:04 +0000 (14:50 +0800)]
drm/amdgpu: fix incorrect pcie_gen_mask in passthrough case

[why]
Passthrough case is treated as root bus and pcie_gen_mask is set as
default value that does not support GEN 3 and GEN 4 for PCIe link
speed. So PCIe link speed will be downgraded at smu hw init in
passthrough condition

[how]
Move get pci info after detect virtualization and check if it is
passthrough case when set pcie_gen_mask

Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: drop unused count variable in create_eml_sink()
Hamza Mahfooz [Wed, 17 May 2023 18:09:39 +0000 (14:09 -0400)]
drm/amd/display: drop unused count variable in create_eml_sink()

Since, we are only interested in having
drm_edid_override_connector_update(), update the value of
connector->edid_blob_ptr. We don't care about the return value of
drm_edid_override_connector_update() here. So, drop count.

Fixes: 550e5d23f147 ("drm/amd/display: assign edid_blob_ptr with edid from debugfs")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: drop unused function set_abm_event()
Hamza Mahfooz [Wed, 17 May 2023 17:49:29 +0000 (13:49 -0400)]
drm/amd/display: drop unused function set_abm_event()

set_abm_event() is never actually used. So, drop it.

Fixes: b8fe56375f78 ("drm/amd/display: Refactor ABM feature")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Tom Rix <trix@redhat.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/display: drop redundant memset() in get_available_dsc_slices()
Hamza Mahfooz [Wed, 17 May 2023 17:31:45 +0000 (13:31 -0400)]
drm/amd/display: drop redundant memset() in get_available_dsc_slices()

get_available_dsc_slices() returns the number of indices set, and all of
the users of get_available_dsc_slices() don't cross the returned bound
when iterating over available_slices[]. So, the memset() in
get_available_dsc_slices() is redundant and can be dropped.

Fixes: 97bda0322b8a ("drm/amd/display: Add DSC support for Navi (v2)")
Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: fix S3 issue if MQD in VRAM
Jack Xiao [Wed, 17 May 2023 09:07:01 +0000 (17:07 +0800)]
drm/amdgpu: fix S3 issue if MQD in VRAM

1. Need flush HDP for MQD putting in vram
2. Zero out mes MQD

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix warnings in amdgpu _sdma, _ucode.c
Srinivasan Shanmugam [Thu, 18 May 2023 02:22:06 +0000 (07:52 +0530)]
drm/amdgpu: Fix warnings in amdgpu _sdma, _ucode.c

Fix below checkpatch warnings:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Comparisons should place the constant on the right side of the test
WARNING: Missing a blank line after declarations

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/amdgpu: Fix errors & warnings in amdgpu _uvd, _vce.c
Srinivasan Shanmugam [Wed, 17 May 2023 15:54:43 +0000 (21:24 +0530)]
drm/amd/amdgpu: Fix errors & warnings in amdgpu _uvd, _vce.c

Fix below checkpatch errors & warnings:

In amdgpu_uvd.c:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Prefer 'unsigned int *' to bare use of 'unsigned *'
WARNING: Missing a blank line after declarations
WARNING: %Lx is non-standard C, use %llx
ERROR: space required before the open parenthesis '('
ERROR: space required before the open brace '{'
WARNING: %LX is non-standard C, use %llX
WARNING: Block comments use * on subsequent lines
+/* multiple fence commands without any stream commands in between can
+   crash the vcpu so just try to emmit a dummy create/destroy msg to
WARNING: Block comments use a trailing */ on a separate line
+   avoid this */
WARNING: braces {} are not necessary for single statement blocks
+               for (j = 0; j < adev->uvd.num_enc_rings; ++j) {
+                       fences += amdgpu_fence_count_emitted(&adev->uvd.inst[i].ring_enc[j]);
+               }

In amdgpu_vce.c:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations
WARNING: %Lx is non-standard C, use %llx
WARNING: Possible repeated word: 'we'
ERROR: space required before the open parenthesis '('

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3
YiPeng Chai [Tue, 16 May 2023 09:34:17 +0000 (17:34 +0800)]
drm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3

perform mode2 reset for sdma fed error on gfx v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/amdgpu: Fix errors & warnings in amdgpu_vcn.c
Srinivasan Shanmugam [Wed, 17 May 2023 15:01:02 +0000 (20:31 +0530)]
drm/amd/amdgpu: Fix errors & warnings in amdgpu_vcn.c

Fix below checkpatch insisted error & warnings:

ERROR: space required before the open brace '{'
WARNING: braces {} are not necessary for any arm of this statement
+       if ((type == VCN_ENCODE_RING) && (vcn_config & VCN_BLOCK_ENCODE_DISABLE_MASK)) {
[...]
+       } else if ((type == VCN_DECODE_RING) && (vcn_config & VCN_BLOCK_DECODE_DISABLE_MASK)) {
[...]
+       } else if ((type == VCN_UNIFIED_RING) && (vcn_config & VCN_BLOCK_QUEUE_DISABLE_MASK)) {
[...]
ERROR: code indent should use tabs where possible
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
+               for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
+                       fence[j] += amdgpu_fence_count_emitted(&adev->vcn.inst[j].ring_enc[i]);
+
ERROR: space required before the open parenthesis '('
WARNING: Missing a blank line after declarations
WARNING: please, no spaces at the start of a line
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/amdgpu: Fix warnings in amdgpu_encoders.c
Srinivasan Shanmugam [Wed, 17 May 2023 13:08:00 +0000 (18:38 +0530)]
drm/amd/amdgpu: Fix warnings in amdgpu_encoders.c

Fix below checkpatch warnings:

WARNING: Missing a blank line after declarations
+                       struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
+                       amdgpu_encoder->active_device = amdgpu_encoder->devices & amdgpu_connector->devices;

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: fix stack size in svm_range_validate_and_map
Alex Deucher [Wed, 17 May 2023 19:04:35 +0000 (15:04 -0400)]
drm/amdkfd: fix stack size in svm_range_validate_and_map

Allocate large local variable on heap to avoid exceeding the
stack size:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function ‘svm_range_validate_and_map’:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1690:1: warning: the frame size of 2360 bytes is larger than 2048 bytes [-Wframe-larger-than=]

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.c
Srinivasan Shanmugam [Wed, 17 May 2023 14:40:48 +0000 (20:10 +0530)]
drm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.c

Fix below checkpatch insisted error & warnings:

ERROR: Macros with complex values should be enclosed in parentheses
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
WARNING: Block comments use a trailing */ on a separate line
WARNING: Missing a blank line after declarations

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/vcn4: fix endian conversion
Alex Deucher [Tue, 16 May 2023 20:56:49 +0000 (16:56 -0400)]
drm/amdgpu/vcn4: fix endian conversion

sq.is_enabled is a byte so there is no need to endian swap it.

Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/gmc9: fix 64 bit division in partition code
Alex Deucher [Tue, 16 May 2023 21:16:30 +0000 (17:16 -0400)]
drm/amdgpu/gmc9: fix 64 bit division in partition code

Rework logic or use do_div() to avoid problems on 32 bit.

v2: add a missing case for XCP macro
v3: fix out of bounds array access
v4: fix xcp handling harder

Acked-by: Guchun Chen <guchun.chen@amd.com> (v1)
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: initialize RAS for gfx_v9_4_3
Tao Zhou [Wed, 8 Feb 2023 09:05:08 +0000 (17:05 +0800)]
drm/amdgpu: initialize RAS for gfx_v9_4_3

Register GFX RAS functions and initialize GFX RAS.

v2: remove xcp operations.
v3: reuse the return value of gfx_ras_sw_init.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add sq timeout status functions for gfx_v9_4_3
Tao Zhou [Fri, 10 Feb 2023 10:41:28 +0000 (18:41 +0800)]
drm/amdgpu: add sq timeout status functions for gfx_v9_4_3

Query and reset sq timeout status.

v2: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS error count reset for gfx_v9_4_3
Tao Zhou [Wed, 8 Feb 2023 06:54:01 +0000 (14:54 +0800)]
drm/amdgpu: add RAS error count reset for gfx_v9_4_3

Add GFX RAS error count reset function.

v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS error count query for gfx_v9_4_3
Tao Zhou [Fri, 17 Mar 2023 09:13:46 +0000 (17:13 +0800)]
drm/amdgpu: add RAS error count query for gfx_v9_4_3

Query GFX RAS ce/ue count.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS error count definitions for gfx_v9_4_3
Tao Zhou [Mon, 6 Feb 2023 03:38:19 +0000 (11:38 +0800)]
drm/amdgpu: add RAS error count definitions for gfx_v9_4_3

Prepare for the query of GFX RAS ce/ue count.

v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add more CE/UE registsers to query list.
    add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.
v4: move gfx memory id definitions to gfx_v9_4_3.
v5: create a dedicated patch for adding error count query function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS definitions for GFX
Tao Zhou [Tue, 7 Feb 2023 10:30:55 +0000 (18:30 +0800)]
drm/amdgpu: add RAS definitions for GFX

Add common GFX RAS definitions.

v2: remove instance from amdgpu_gfx_ras_reg_entry,
    amdgpu_ras_err_status_reg_entry has already defined it.
v3: remove memory id definitions from amdgpu_gfx.h, they are
    related to IP version.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add gc v9_4_3 ras error status registers
Hawking Zhang [Mon, 27 Feb 2023 09:36:19 +0000 (17:36 +0800)]
drm/amdgpu: Add gc v9_4_3 ras error status registers

GC v9_4_3 introduces UE|CE_ERR_STATUS_LO|HI to log
hardware errors

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS status reset for gfx_v9_4_3
Tao Zhou [Fri, 3 Feb 2023 02:41:26 +0000 (10:41 +0800)]
drm/amdgpu: add RAS status reset for gfx_v9_4_3

Reset GFX RAS status registers.

v2: fix typo in title.
    remove xcp operation.
v3: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS status query for gfx_v9_4_3
Tao Zhou [Thu, 2 Feb 2023 09:20:23 +0000 (17:20 +0800)]
drm/amdgpu: add RAS status query for gfx_v9_4_3

Query GFX RAS status.

v2: remove xcp operation.
v3: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add GFX RAS common function
Tao Zhou [Thu, 2 Feb 2023 10:57:04 +0000 (18:57 +0800)]
drm/amdgpu: add GFX RAS common function

The common function can help reduce redundant code.

v2: remove xcp operation, only need to do RAS operations for all
instances.
v3: remove check for GFX RAS support, will be checked in higher level.
    add amdgpu prefix for the function name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Do not access members of xcp w/o check (v2)
Hawking Zhang [Fri, 12 May 2023 05:22:57 +0000 (13:22 +0800)]
drm/amdgpu: Do not access members of xcp w/o check (v2)

Not all the asic needs xcp. ensure check xcp availabity
before accessing its member.

v2: add missing change in kfd_topology.c

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Fix null ptr access
Hawking Zhang [Thu, 11 May 2023 09:01:03 +0000 (17:01 +0800)]
drm/amdkfd: Fix null ptr access

Avoid access null xcp_mgr pointer.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add check for RAS instance mask
Tao Zhou [Mon, 20 Mar 2023 10:21:14 +0000 (18:21 +0800)]
drm/amdgpu: add check for RAS instance mask

The mask is only needed to be set when RAS block instance number is
more than 1 and invalid bits should be also masked out.
We only check valid bits for GFX and SDMA block for now, and will
add check for other RAS blocks in the future.

v2: move the check under injection operation since the mask is only
    used by RAS error inject.
v3: add valid bits handling for SDMA.
v4: print message if the mask is adjusted.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2
Tao Zhou [Mon, 13 Mar 2023 08:34:19 +0000 (16:34 +0800)]
drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2

No special requirement in RAS injection for the two versions, switch to
use default injection interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: reorganize RAS injection flow
Tao Zhou [Mon, 13 Mar 2023 08:24:11 +0000 (16:24 +0800)]
drm/amdgpu: reorganize RAS injection flow

So GFX RAS injection could use default function if it doesn't define its
own injection interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add instance mask for RAS inject
Tao Zhou [Mon, 27 Feb 2023 10:25:23 +0000 (18:25 +0800)]
drm/amdgpu: add instance mask for RAS inject

User can specify injected instances by the mask. For backward
compatibility, the mask value is incorporated into sub block index
without interface change of RAS TA.
User uses logical mask and driver should convert it to physical value
before sending it to RAS TA.

v2: update parameter name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: convert logical instance mask to physical one
Tao Zhou [Mon, 27 Feb 2023 08:31:56 +0000 (16:31 +0800)]
drm/amdgpu: convert logical instance mask to physical one

Convert instance mask for the convenience of RAS TA.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Enable IH CAM on GFX9.4.3
Mukul Joshi [Wed, 12 Apr 2023 20:56:29 +0000 (16:56 -0400)]
drm/amdgpu: Enable IH CAM on GFX9.4.3

This patch enables IH CAM on GFX9.4.3 ASIC.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Correct get_xcp_mem_id calculation
Philip Yang [Wed, 19 Apr 2023 21:39:35 +0000 (17:39 -0400)]
drm/amdgpu: Correct get_xcp_mem_id calculation

Current calculation only works for NPS4/QPX mode, correct it for
NPS4/CPX mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Refactor migrate init to support partition switch
Philip Yang [Fri, 31 Mar 2023 15:18:12 +0000 (11:18 -0400)]
drm/amdkfd: Refactor migrate init to support partition switch

Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
only once in amdgpu_device_ip_init after adev ip blocks are initialized,
but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
SVM support based on pgmap.

svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
switching compute partition mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: route ioctls on primary node of XCPs to primary device
Shiwu Zhang [Fri, 31 Mar 2023 09:16:41 +0000 (17:16 +0800)]
drm/amdgpu: route ioctls on primary node of XCPs to primary device

During XCP init, unlike the primary device, there is no amdgpu_device
attached to each XCP's drm_device

In case that user trying to open/close the primary node of XCP drm_device
this rerouting is to solve the NULL pointer issue causing by referring
to any member of the amdgpu_device

 BUG: unable to handle page fault for address: 0000000000020c80
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 Call Trace:
  <TASK>
  lock_timer_base+0x6b/0x90
  try_to_del_timer_sync+0x2b/0x80
  del_timer_sync+0x29/0x40
  flush_delayed_work+0x1c/0x50
  amdgpu_driver_open_kms+0x2c/0x280 [amdgpu]
  drm_file_alloc+0x1b3/0x260 [drm]
  drm_open+0xaa/0x280 [drm]
  drm_stub_open+0xa2/0x120 [drm]
  chrdev_open+0xa6/0x1c0

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: APU mode set max svm range pages
Philip Yang [Thu, 23 Mar 2023 12:45:56 +0000 (08:45 -0400)]
drm/amdkfd: APU mode set max svm range pages

svm_migrate_init set the max svm range pages based on the KFD nodes
partition size. APU mode don't init pgmap because there is no migration.

kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation
and initialization.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Fix memory reporting on GFX 9.4.3
Mukul Joshi [Mon, 20 Mar 2023 15:22:30 +0000 (11:22 -0400)]
drm/amdkfd: Fix memory reporting on GFX 9.4.3

This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU
by reporting available memory on a per partition basis. If its an
APU, available and used memory calculations take into account
system and TTM memory.

v2: squash in fix ("drm/amdkfd: Fix array out of bound warning")
    squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Move local_mem_info to kfd_node
Mukul Joshi [Mon, 20 Mar 2023 15:21:38 +0000 (11:21 -0400)]
drm/amdkfd: Move local_mem_info to kfd_node

We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.

v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: use xcp partition ID for amdgpu_gem
James Zhu [Mon, 13 Mar 2023 16:03:18 +0000 (12:03 -0400)]
drm/amdgpu: use xcp partition ID for amdgpu_gem

Find xcp_id from amdgpu_fpriv, use it for amdgpu_gem_object_create.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: KFD graphics interop support compute partition
Philip Yang [Fri, 10 Mar 2023 00:30:02 +0000 (19:30 -0500)]
drm/amdgpu: KFD graphics interop support compute partition

kfd_ioctl_get_dmabuf use the amdgpu bo xcp_id to get the gpu_id of the
KFD node from the exported dmabuf_adev, and then create kfd bo on the
correct adev and KFD node when importing the amdgpu bo to KFD.

Remove function kfd_device_by_adev, it is not needed as it is the same
result as dmabuf_adev->kfd.dev->nodes[0]->id.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Store xcp partition id to amdgpu bo
Philip Yang [Wed, 8 Mar 2023 16:57:00 +0000 (11:57 -0500)]
drm/amdkfd: Store xcp partition id to amdgpu bo

For memory accounting per compute partition and export drm amdgpu bo and
then import to KFD, we need the xcp id to account the memory usage or
find the KFD node of the original amdgpu bo to create the KFD bo on the
correct adev KFD node.

Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to
amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id.

v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU")

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: dGPU mode set VRAM range lpfn as exclusive
Philip Yang [Tue, 7 Mar 2023 16:30:24 +0000 (11:30 -0500)]
drm/amdgpu: dGPU mode set VRAM range lpfn as exclusive

TTM place lpfn is exclusive used as end (start + size) in drm and buddy
allocator, adev->gmc memory partition range lpfn is inclusive (start +
size - 1), should plus 1 to set TTM place lpfn.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Alloc page table on correct memory partition
Philip Yang [Fri, 24 Feb 2023 01:00:05 +0000 (20:00 -0500)]
drm/amdgpu: Alloc page table on correct memory partition

Alloc kernel mode page table bo uses the amdgpu_vm->mem_id + 1 as bp
mem_id_plus1 parameter. For APU mode, select the correct TTM pool to
alloc page from the corresponding memory partition, this will be the
closest NUMA node. For dGPU mode, select the correct address range for
vram manager.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Update MTYPE for far memory partition
Philip Yang [Thu, 2 Feb 2023 16:07:53 +0000 (11:07 -0500)]
drm/amdkfd: Update MTYPE for far memory partition

Use MTYPE RW/MTYPE_CC for mapping system memory or VRAM to KFD node
within the same memory partition, use MTYPE_NC for mapping on KFD node
from the far memory partition of the same socket or from another socket
on same XGMI hive.

On NPS4 or 4P system, MTYPE will be overridden per page depending on
the memory NUMA node id and vm->mem_id.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: dGPU mode placement support memory partition
Philip Yang [Thu, 26 Jan 2023 23:54:29 +0000 (18:54 -0500)]
drm/amdgpu: dGPU mode placement support memory partition

dGPU mode uses VRAM manager to validate bo, amdgpu bo placement use the
mem_id  to get the allocation range first, last page frame number
from xcp manager, pass to drm buddy allocator as the allowed range.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: SVM range allocation support memory partition
Philip Yang [Thu, 26 Jan 2023 23:45:32 +0000 (18:45 -0500)]
drm/amdkfd: SVM range allocation support memory partition

Pass kfd node->xcp->mem_id to amdgpu bo create parameter mem_id_plus1 to
allocate new svm_bo on the specified memory partition.

This is only for dGPU mode as we don't migrate with APU mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Alloc memory of GPU support memory partition
Philip Yang [Thu, 26 Jan 2023 23:50:09 +0000 (18:50 -0500)]
drm/amdkfd: Alloc memory of GPU support memory partition

For dGPU mode VRAM allocation, create amdgpu_bo from amdgpu_vm->mem_id,
to alloc from the correct memory range.

For APU mode VRAM allocation, set alloc domain to GTT, and set
bp->mem_id_plus1 from amdgpu_vm->mem_id + 1 to create amdgpu_bo, to
allocate system memory from correct NUMA node.

For GTT allocation, use mem_id -1 to allocate system memory from any
NUMA nodes.

Remove amdgpu_ttm_tt_set_mem_pool, to avoid the confusion that memory
maybe allocated from different mem_id.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add memory partition mem_id to amdgpu_bo
Philip Yang [Thu, 26 Jan 2023 23:25:28 +0000 (18:25 -0500)]
drm/amdgpu: Add memory partition mem_id to amdgpu_bo

Add mem_id_plus1 parameter to amdgpu_gem_object_create and pass it to
amdgpu_bo_create. For dGPU mode allocation, mem_id is used by VRAM
manager to get the memory partition fpfn, lpfn from xcp manager. For APU
native mode allocation, mem_id is used to get NUMA node id from xcp
manager, then pass to TTM as numa pool id to alloc memory from the
specific NUMA node. mem_id -1 means for entire VRAM or any NUMA nodes.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Show KFD node memory partition info
Philip Yang [Thu, 26 Jan 2023 23:11:29 +0000 (18:11 -0500)]
drm/amdkfd: Show KFD node memory partition info

Show KFD node memory partition id and size, add helper function
KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used
later to support memory accounting per partition.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add memory partition id to amdgpu_vm
Philip Yang [Fri, 24 Feb 2023 00:58:22 +0000 (19:58 -0500)]
drm/amdgpu: Add memory partition id to amdgpu_vm

If xcp_mgr is initialized, add mem_id to amdgpu_vm structure to store
memory partition number when creating amdgpu_vm for the xcp. The xcp
number is decided when opening the render device, for example
/dev/dri/renderD129 is xcp_id 0, /dev/dri/renderD130 is xcp_id 1.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Store drm node minor number for kfd nodes
Philip Yang [Thu, 23 Feb 2023 16:03:37 +0000 (11:03 -0500)]
drm/amdkfd: Store drm node minor number for kfd nodes

From KFD topology, application will find kfd node with the corresponding
drm device node minor number, for example if partition drm node starts
from /dev/dri/renderD129, then KFD node 0 with store drm node minor
number 129. Application will open drm node /dev/dri/renderD129 to create
amdgpu vm for kfd node 0 with the correct vm->mem_id to indicate the
memory partition.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add xcp manager num_xcp_per_mem_partition
Philip Yang [Sat, 4 Mar 2023 00:45:45 +0000 (19:45 -0500)]
drm/amdgpu: Add xcp manager num_xcp_per_mem_partition

Used by KFD to check memory limit accounting.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: update ref_cnt before ctx free
James Zhu [Mon, 15 Aug 2022 21:21:44 +0000 (17:21 -0400)]
drm/amdgpu: update ref_cnt before ctx free

Update ref_cnt before ctx free.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: run partition schedule if it is supported
James Zhu [Mon, 15 Aug 2022 21:15:02 +0000 (17:15 -0400)]
drm/amdgpu: run partition schedule if it is supported

Run partition schedule if it is supported during ctx init entity.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add partition schedule for GC(9, 4, 3)
James Zhu [Mon, 15 Aug 2022 21:00:54 +0000 (17:00 -0400)]
drm/amdgpu: add partition schedule for GC(9, 4, 3)

Implement partition schedule for GC(9, 4, 3).

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: keep amdgpu_ctx_mgr in ctx structure
James Zhu [Mon, 15 Aug 2022 21:12:01 +0000 (17:12 -0400)]
drm/amdgpu: keep amdgpu_ctx_mgr in ctx structure

Keep amdgpu_ctx_mgr in ctx structure to track fpriv.

v2: add missing fpriv declaration lost in rebase

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add partition scheduler list update
James Zhu [Mon, 15 Aug 2022 21:19:11 +0000 (17:19 -0400)]
drm/amdgpu: add partition scheduler list update

Add partition scheduler list update in late init
and xcp partition mode switch.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: update header to support partition scheduling
James Zhu [Mon, 15 Aug 2022 20:55:02 +0000 (16:55 -0400)]
drm/amdgpu: update header to support partition scheduling

Update header to support partition scheduling.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add partition ID track in ring
James Zhu [Mon, 15 Aug 2022 20:45:12 +0000 (16:45 -0400)]
drm/amdgpu: add partition ID track in ring

Keep track partition ID in ring.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: find partition ID when open device
James Zhu [Tue, 28 Feb 2023 19:16:38 +0000 (14:16 -0500)]
drm/amdgpu: find partition ID when open device

Find partition ID when open device from render device minor.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: support partition drm devices
James Zhu [Mon, 15 Aug 2022 20:55:02 +0000 (16:55 -0400)]
drm/amdgpu: support partition drm devices

Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3).

This is a temporary solution and will be superceded.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/bu: update mtype_local parameter settings
Graham Sider [Mon, 6 Mar 2023 22:56:44 +0000 (17:56 -0500)]
drm/amdgpu/bu: update mtype_local parameter settings

Update mtype_local module parameter to use MTYPE_RW by default.

0: MTYPE_RW (default)
1: MTYPE_NC
2: MTYPE_CC

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/bu: add mtype_local as a module parameter
David Francis [Mon, 27 Feb 2023 15:33:11 +0000 (10:33 -0500)]
drm/amdgpu/bu: add mtype_local as a module parameter

Selects the MTYPE to be used for local memory,
(0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW)

v2: squash in build fix (Alex)

Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUs
Felix Kuehling [Tue, 21 Feb 2023 22:44:18 +0000 (17:44 -0500)]
drm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUs

On GFXv9.4.3 NUMA APUs, system memory locality must be determined per
page to choose the correct MTYPE. This patch adds a GMC callback that
can provide this per-page override and implements it for native mode.

Carve-out mode is not yet supported and will use the safe default
(remote) MTYPE for system memory.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3
Felix Kuehling [Tue, 21 Feb 2023 22:31:32 +0000 (17:31 -0500)]
drm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3

Treat system memory on NUMA systems as remote by default. Overriding with
a more efficient MTYPE per page will be implemented in the next patch.

No need for a special case for APP APUs. System memory is handled the same
for carve-out and native mode. And VRAM doesn't exist in native mode.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/bu: Add use_mtype_cc_wa module param
Graham Sider [Mon, 6 Feb 2023 19:04:42 +0000 (14:04 -0500)]
drm/amdgpu/bu: Add use_mtype_cc_wa module param

By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC
instead of MTYPE_RW by default. This is required for the time being to
mitigate a bug causing XCCs to hit stale data due to TCC marking fully
dirty lines as exclusive.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Use legacy TLB flush for gfx943
Graham Sider [Wed, 8 Feb 2023 16:10:57 +0000 (11:10 -0500)]
drm/amdgpu: Use legacy TLB flush for gfx943

Invalidate TLBs via a legacy flush request (flush_type=0) prior to the
heavyweight flush requests (flush_type=2) in gmc_v9_0.c. This is
temporarily required to mitigate a bug causing CPC UTCL1 to return stale
translations after invalidation requests in address range mode.

v2: squash in long term fix "drm/amdgpu: disable extra gfx943 legacy flush on rev1+"

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: For GFX 9.4.3 APU fix vram_usage value
Harish Kasiviswanathan [Fri, 28 Apr 2023 18:20:00 +0000 (14:20 -0400)]
drm/amdgpu: For GFX 9.4.3 APU fix vram_usage value

For GFX 9.4.3 APP APU VRAM is allocated in GTT domain. While freeing
memory check for GTT domain instead of VRAM if it is APP APU

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Enable NPS4 CPX mode
Philip Yang [Wed, 19 Apr 2023 21:43:26 +0000 (17:43 -0400)]
drm/amdgpu: Enable NPS4 CPX mode

CPX compute mode is valid mode for NPS4 memory partition mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Move pgmap to amdgpu_kfd_dev structure
Philip Yang [Fri, 31 Mar 2023 15:13:40 +0000 (11:13 -0400)]
drm/amdkfd: Move pgmap to amdgpu_kfd_dev structure

VRAM pgmap resource is allocated every time when switching compute
partitions because kfd_dev is re-initialized by post_partition_switch,
As a result, it causes memory region resource leaking and system
memory usage accounting unbalanced.

pgmap resource should be allocated and registered only once when loading
driver and freed when unloading driver, move it from kfd_dev to
amdgpu_kfd_dev.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Skip halting RLC on GFX v9.4.3
Lijo Lazar [Fri, 24 Mar 2023 09:51:30 +0000 (15:21 +0530)]
drm/amdgpu: Skip halting RLC on GFX v9.4.3

RLC-PMFW handshake happens periodically when GFXCLK DPM is enabled and
halting RLC may cause unexpected results. Avoid halting RLC from driver
side.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix register accesses in GFX v9.4.3
Lijo Lazar [Thu, 16 Mar 2023 09:29:27 +0000 (14:59 +0530)]
drm/amdgpu: Fix register accesses in GFX v9.4.3

Access registers with the right xcc id. Also, remove the unused logic as
PG is not used in GFX v9.4.3

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Increase queue number per process to 255 on GFX9.4.3
Mukul Joshi [Wed, 15 Mar 2023 18:04:33 +0000 (14:04 -0400)]
drm/amdkfd: Increase queue number per process to 255 on GFX9.4.3

Increase the maximum number of queues that can be created per process
to 255 on GFX 9.4.3. There is no HWS limitation restricting the number
queues that can be created.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Adjust the sequence to query ras error info
Hawking Zhang [Mon, 20 Mar 2023 09:51:30 +0000 (17:51 +0800)]
drm/amdgpu: Adjust the sequence to query ras error info

It turns out STATUS_VALID_FLAG needs to be checked
ahead of any other fields. ADDRESS_VALID_FLAG and
ERR_INFO_VALID_FLAG only manages ADDRESS and ERR_INFO
field respectively. driver should continue poll
ERR CNT field even ERR_INFO_VALD_FLAG is not set.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Initialize jpeg v4_0_3 ras function
Hawking Zhang [Mon, 6 Mar 2023 03:03:27 +0000 (11:03 +0800)]
drm/amdgpu: Initialize jpeg v4_0_3 ras function

Initialize jpeg v4_0_3 ras function.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add reset_ras_error_count for jpeg v4_0_3
Hawking Zhang [Thu, 2 Mar 2023 10:04:24 +0000 (18:04 +0800)]
drm/amdgpu: Add reset_ras_error_count for jpeg v4_0_3

Add reset_ras_error_count callback for jpeg v4_0_3.
It will be used to reset jpeg ras error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add query_ras_error_count for jpeg v4_0_3
Hawking Zhang [Thu, 2 Mar 2023 09:56:59 +0000 (17:56 +0800)]
drm/amdgpu: Add query_ras_error_count for jpeg v4_0_3

Add query_ras_error_count callback for jpeg v4_0_3.
It will be used to query and log jpeg error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Re-enable VCN RAS if DPG is enabled
Hawking Zhang [Thu, 2 Mar 2023 08:38:38 +0000 (16:38 +0800)]
drm/amdgpu: Re-enable VCN RAS if DPG is enabled

VCN RAS enablement sequence needs to be added in
DPG HW init sequence.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Initialize vcn v4_0_3 ras function
Hawking Zhang [Mon, 6 Mar 2023 03:00:11 +0000 (11:00 +0800)]
drm/amdgpu: Initialize vcn v4_0_3 ras function

Initialize vcn v4_0_3 ras function

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add reset_ras_error_count for vcn v4_0_3
Hawking Zhang [Thu, 2 Mar 2023 06:23:47 +0000 (14:23 +0800)]
drm/amdgpu: Add reset_ras_error_count for vcn v4_0_3

Add reset_ras_error_count callback for vcn v4_0_3.
It will be used to reset vcn ras error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add query_ras_error_count for vcn v4_0_3
Hawking Zhang [Wed, 1 Mar 2023 12:37:56 +0000 (20:37 +0800)]
drm/amdgpu: Add query_ras_error_count for vcn v4_0_3

Add query_ras_error_count callback for vcn v4_0_3.
It will be used to query and log vcn error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add vcn/jpeg ras err status registers
Hawking Zhang [Wed, 1 Mar 2023 02:05:17 +0000 (10:05 +0800)]
drm/amdgpu: Add vcn/jpeg ras err status registers

Add new ras error status registers introduced in
vcn v4_0_3 to log vcn and jpeg ras error.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Checked if the pointer NULL before use it.
Gavin Wan [Fri, 17 Mar 2023 22:42:30 +0000 (18:42 -0400)]
drm/amdgpu: Checked if the pointer NULL before use it.

For SRIOV on some parts, the host driver does not post VBIOS. So the guest
cannot get bios information. Therefore, adev->virt.fw_reserve.p_pf2vf
and adev->mode_info.atom_context are NULL.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Set memory partitions to 1 for SRIOV.
Gavin Wan [Mon, 10 Apr 2023 19:04:26 +0000 (15:04 -0400)]
drm/amdgpu: Set memory partitions to 1 for SRIOV.

For SRIOV, the memory partitions are set on host drover. Each VF only
has one memory partition. We need set the memory partitions to 1 on
guest driver for SRIOV.

V2: sqaush in fix ("drm/amdgpu: Fix memory range info of GC 9.4.3 VFs")

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Acked-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Skip using MC FB Offset when APU flag is set for SRIOV.
Gavin Wan [Mon, 3 Apr 2023 21:49:41 +0000 (17:49 -0400)]
drm/amdgpu: Skip using MC FB Offset when APU flag is set for SRIOV.

The MC_VM_FB_OFFSET is PF only register. It cannot be read on VF.
So, the driver should not use MC_VM_FB_OFFSET address to set the
address of dev->gmc.aper_base.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add PSP supporting PSP 13.0.6 SRIOV ucode init.
Gavin Wan [Thu, 16 Mar 2023 17:44:41 +0000 (13:44 -0400)]
drm/amdgpu: Add PSP supporting PSP 13.0.6 SRIOV ucode init.

Add PSP supporting PSP 13.0.6 SRIOV ucode init.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add PSP spatial parition interface
Lijo Lazar [Fri, 10 Mar 2023 10:08:19 +0000 (15:38 +0530)]
drm/amdgpu: Add PSP spatial parition interface

Add PSP ring command interface for spatial partitioning.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Return error on invalid compute mode
Lijo Lazar [Tue, 7 Mar 2023 05:06:08 +0000 (10:36 +0530)]
drm/amdgpu: Return error on invalid compute mode

Return error if an invalid compute partition mode is requested.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add compute mode descriptor function
Lijo Lazar [Tue, 7 Mar 2023 05:03:05 +0000 (10:33 +0530)]
drm/amdgpu: Add compute mode descriptor function

Keep a helper function to get description of compute partition mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix unmapping of aperture
Lijo Lazar [Fri, 3 Mar 2023 12:33:00 +0000 (18:03 +0530)]
drm/amdgpu: Fix unmapping of aperture

When aperture size is zero, there is no mapping done.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix xGMI access P2P mapping failure on GFXIP 9.4.3
Rajneesh Bhardwaj [Mon, 27 Feb 2023 18:17:14 +0000 (13:17 -0500)]
drm/amdgpu: Fix xGMI access P2P mapping failure on GFXIP 9.4.3

On GFXIP 9.4.3, we dont need to rely on xGMI hive info to determine P2P
access.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>