linux.git
2 years agodrm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.c
Srinivasan Shanmugam [Wed, 17 May 2023 14:40:48 +0000 (20:10 +0530)]
drm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.c

Fix below checkpatch insisted error & warnings:

ERROR: Macros with complex values should be enclosed in parentheses
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
WARNING: Block comments use a trailing */ on a separate line
WARNING: Missing a blank line after declarations

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/vcn4: fix endian conversion
Alex Deucher [Tue, 16 May 2023 20:56:49 +0000 (16:56 -0400)]
drm/amdgpu/vcn4: fix endian conversion

sq.is_enabled is a byte so there is no need to endian swap it.

Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/gmc9: fix 64 bit division in partition code
Alex Deucher [Tue, 16 May 2023 21:16:30 +0000 (17:16 -0400)]
drm/amdgpu/gmc9: fix 64 bit division in partition code

Rework logic or use do_div() to avoid problems on 32 bit.

v2: add a missing case for XCP macro
v3: fix out of bounds array access
v4: fix xcp handling harder

Acked-by: Guchun Chen <guchun.chen@amd.com> (v1)
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: initialize RAS for gfx_v9_4_3
Tao Zhou [Wed, 8 Feb 2023 09:05:08 +0000 (17:05 +0800)]
drm/amdgpu: initialize RAS for gfx_v9_4_3

Register GFX RAS functions and initialize GFX RAS.

v2: remove xcp operations.
v3: reuse the return value of gfx_ras_sw_init.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add sq timeout status functions for gfx_v9_4_3
Tao Zhou [Fri, 10 Feb 2023 10:41:28 +0000 (18:41 +0800)]
drm/amdgpu: add sq timeout status functions for gfx_v9_4_3

Query and reset sq timeout status.

v2: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS error count reset for gfx_v9_4_3
Tao Zhou [Wed, 8 Feb 2023 06:54:01 +0000 (14:54 +0800)]
drm/amdgpu: add RAS error count reset for gfx_v9_4_3

Add GFX RAS error count reset function.

v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS error count query for gfx_v9_4_3
Tao Zhou [Fri, 17 Mar 2023 09:13:46 +0000 (17:13 +0800)]
drm/amdgpu: add RAS error count query for gfx_v9_4_3

Query GFX RAS ce/ue count.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS error count definitions for gfx_v9_4_3
Tao Zhou [Mon, 6 Feb 2023 03:38:19 +0000 (11:38 +0800)]
drm/amdgpu: add RAS error count definitions for gfx_v9_4_3

Prepare for the query of GFX RAS ce/ue count.

v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add more CE/UE registsers to query list.
    add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.
v4: move gfx memory id definitions to gfx_v9_4_3.
v5: create a dedicated patch for adding error count query function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS definitions for GFX
Tao Zhou [Tue, 7 Feb 2023 10:30:55 +0000 (18:30 +0800)]
drm/amdgpu: add RAS definitions for GFX

Add common GFX RAS definitions.

v2: remove instance from amdgpu_gfx_ras_reg_entry,
    amdgpu_ras_err_status_reg_entry has already defined it.
v3: remove memory id definitions from amdgpu_gfx.h, they are
    related to IP version.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add gc v9_4_3 ras error status registers
Hawking Zhang [Mon, 27 Feb 2023 09:36:19 +0000 (17:36 +0800)]
drm/amdgpu: Add gc v9_4_3 ras error status registers

GC v9_4_3 introduces UE|CE_ERR_STATUS_LO|HI to log
hardware errors

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS status reset for gfx_v9_4_3
Tao Zhou [Fri, 3 Feb 2023 02:41:26 +0000 (10:41 +0800)]
drm/amdgpu: add RAS status reset for gfx_v9_4_3

Reset GFX RAS status registers.

v2: fix typo in title.
    remove xcp operation.
v3: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add RAS status query for gfx_v9_4_3
Tao Zhou [Thu, 2 Feb 2023 09:20:23 +0000 (17:20 +0800)]
drm/amdgpu: add RAS status query for gfx_v9_4_3

Query GFX RAS status.

v2: remove xcp operation.
v3: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add GFX RAS common function
Tao Zhou [Thu, 2 Feb 2023 10:57:04 +0000 (18:57 +0800)]
drm/amdgpu: add GFX RAS common function

The common function can help reduce redundant code.

v2: remove xcp operation, only need to do RAS operations for all
instances.
v3: remove check for GFX RAS support, will be checked in higher level.
    add amdgpu prefix for the function name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Do not access members of xcp w/o check (v2)
Hawking Zhang [Fri, 12 May 2023 05:22:57 +0000 (13:22 +0800)]
drm/amdgpu: Do not access members of xcp w/o check (v2)

Not all the asic needs xcp. ensure check xcp availabity
before accessing its member.

v2: add missing change in kfd_topology.c

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Fix null ptr access
Hawking Zhang [Thu, 11 May 2023 09:01:03 +0000 (17:01 +0800)]
drm/amdkfd: Fix null ptr access

Avoid access null xcp_mgr pointer.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add check for RAS instance mask
Tao Zhou [Mon, 20 Mar 2023 10:21:14 +0000 (18:21 +0800)]
drm/amdgpu: add check for RAS instance mask

The mask is only needed to be set when RAS block instance number is
more than 1 and invalid bits should be also masked out.
We only check valid bits for GFX and SDMA block for now, and will
add check for other RAS blocks in the future.

v2: move the check under injection operation since the mask is only
    used by RAS error inject.
v3: add valid bits handling for SDMA.
v4: print message if the mask is adjusted.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2
Tao Zhou [Mon, 13 Mar 2023 08:34:19 +0000 (16:34 +0800)]
drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2

No special requirement in RAS injection for the two versions, switch to
use default injection interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: reorganize RAS injection flow
Tao Zhou [Mon, 13 Mar 2023 08:24:11 +0000 (16:24 +0800)]
drm/amdgpu: reorganize RAS injection flow

So GFX RAS injection could use default function if it doesn't define its
own injection interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add instance mask for RAS inject
Tao Zhou [Mon, 27 Feb 2023 10:25:23 +0000 (18:25 +0800)]
drm/amdgpu: add instance mask for RAS inject

User can specify injected instances by the mask. For backward
compatibility, the mask value is incorporated into sub block index
without interface change of RAS TA.
User uses logical mask and driver should convert it to physical value
before sending it to RAS TA.

v2: update parameter name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: convert logical instance mask to physical one
Tao Zhou [Mon, 27 Feb 2023 08:31:56 +0000 (16:31 +0800)]
drm/amdgpu: convert logical instance mask to physical one

Convert instance mask for the convenience of RAS TA.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Enable IH CAM on GFX9.4.3
Mukul Joshi [Wed, 12 Apr 2023 20:56:29 +0000 (16:56 -0400)]
drm/amdgpu: Enable IH CAM on GFX9.4.3

This patch enables IH CAM on GFX9.4.3 ASIC.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Correct get_xcp_mem_id calculation
Philip Yang [Wed, 19 Apr 2023 21:39:35 +0000 (17:39 -0400)]
drm/amdgpu: Correct get_xcp_mem_id calculation

Current calculation only works for NPS4/QPX mode, correct it for
NPS4/CPX mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Refactor migrate init to support partition switch
Philip Yang [Fri, 31 Mar 2023 15:18:12 +0000 (11:18 -0400)]
drm/amdkfd: Refactor migrate init to support partition switch

Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
only once in amdgpu_device_ip_init after adev ip blocks are initialized,
but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
SVM support based on pgmap.

svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
switching compute partition mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: route ioctls on primary node of XCPs to primary device
Shiwu Zhang [Fri, 31 Mar 2023 09:16:41 +0000 (17:16 +0800)]
drm/amdgpu: route ioctls on primary node of XCPs to primary device

During XCP init, unlike the primary device, there is no amdgpu_device
attached to each XCP's drm_device

In case that user trying to open/close the primary node of XCP drm_device
this rerouting is to solve the NULL pointer issue causing by referring
to any member of the amdgpu_device

 BUG: unable to handle page fault for address: 0000000000020c80
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 Call Trace:
  <TASK>
  lock_timer_base+0x6b/0x90
  try_to_del_timer_sync+0x2b/0x80
  del_timer_sync+0x29/0x40
  flush_delayed_work+0x1c/0x50
  amdgpu_driver_open_kms+0x2c/0x280 [amdgpu]
  drm_file_alloc+0x1b3/0x260 [drm]
  drm_open+0xaa/0x280 [drm]
  drm_stub_open+0xa2/0x120 [drm]
  chrdev_open+0xa6/0x1c0

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: APU mode set max svm range pages
Philip Yang [Thu, 23 Mar 2023 12:45:56 +0000 (08:45 -0400)]
drm/amdkfd: APU mode set max svm range pages

svm_migrate_init set the max svm range pages based on the KFD nodes
partition size. APU mode don't init pgmap because there is no migration.

kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation
and initialization.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Fix memory reporting on GFX 9.4.3
Mukul Joshi [Mon, 20 Mar 2023 15:22:30 +0000 (11:22 -0400)]
drm/amdkfd: Fix memory reporting on GFX 9.4.3

This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU
by reporting available memory on a per partition basis. If its an
APU, available and used memory calculations take into account
system and TTM memory.

v2: squash in fix ("drm/amdkfd: Fix array out of bound warning")
    squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Move local_mem_info to kfd_node
Mukul Joshi [Mon, 20 Mar 2023 15:21:38 +0000 (11:21 -0400)]
drm/amdkfd: Move local_mem_info to kfd_node

We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.

v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: use xcp partition ID for amdgpu_gem
James Zhu [Mon, 13 Mar 2023 16:03:18 +0000 (12:03 -0400)]
drm/amdgpu: use xcp partition ID for amdgpu_gem

Find xcp_id from amdgpu_fpriv, use it for amdgpu_gem_object_create.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: KFD graphics interop support compute partition
Philip Yang [Fri, 10 Mar 2023 00:30:02 +0000 (19:30 -0500)]
drm/amdgpu: KFD graphics interop support compute partition

kfd_ioctl_get_dmabuf use the amdgpu bo xcp_id to get the gpu_id of the
KFD node from the exported dmabuf_adev, and then create kfd bo on the
correct adev and KFD node when importing the amdgpu bo to KFD.

Remove function kfd_device_by_adev, it is not needed as it is the same
result as dmabuf_adev->kfd.dev->nodes[0]->id.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Store xcp partition id to amdgpu bo
Philip Yang [Wed, 8 Mar 2023 16:57:00 +0000 (11:57 -0500)]
drm/amdkfd: Store xcp partition id to amdgpu bo

For memory accounting per compute partition and export drm amdgpu bo and
then import to KFD, we need the xcp id to account the memory usage or
find the KFD node of the original amdgpu bo to create the KFD bo on the
correct adev KFD node.

Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to
amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id.

v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU")

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: dGPU mode set VRAM range lpfn as exclusive
Philip Yang [Tue, 7 Mar 2023 16:30:24 +0000 (11:30 -0500)]
drm/amdgpu: dGPU mode set VRAM range lpfn as exclusive

TTM place lpfn is exclusive used as end (start + size) in drm and buddy
allocator, adev->gmc memory partition range lpfn is inclusive (start +
size - 1), should plus 1 to set TTM place lpfn.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Alloc page table on correct memory partition
Philip Yang [Fri, 24 Feb 2023 01:00:05 +0000 (20:00 -0500)]
drm/amdgpu: Alloc page table on correct memory partition

Alloc kernel mode page table bo uses the amdgpu_vm->mem_id + 1 as bp
mem_id_plus1 parameter. For APU mode, select the correct TTM pool to
alloc page from the corresponding memory partition, this will be the
closest NUMA node. For dGPU mode, select the correct address range for
vram manager.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Update MTYPE for far memory partition
Philip Yang [Thu, 2 Feb 2023 16:07:53 +0000 (11:07 -0500)]
drm/amdkfd: Update MTYPE for far memory partition

Use MTYPE RW/MTYPE_CC for mapping system memory or VRAM to KFD node
within the same memory partition, use MTYPE_NC for mapping on KFD node
from the far memory partition of the same socket or from another socket
on same XGMI hive.

On NPS4 or 4P system, MTYPE will be overridden per page depending on
the memory NUMA node id and vm->mem_id.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: dGPU mode placement support memory partition
Philip Yang [Thu, 26 Jan 2023 23:54:29 +0000 (18:54 -0500)]
drm/amdgpu: dGPU mode placement support memory partition

dGPU mode uses VRAM manager to validate bo, amdgpu bo placement use the
mem_id  to get the allocation range first, last page frame number
from xcp manager, pass to drm buddy allocator as the allowed range.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: SVM range allocation support memory partition
Philip Yang [Thu, 26 Jan 2023 23:45:32 +0000 (18:45 -0500)]
drm/amdkfd: SVM range allocation support memory partition

Pass kfd node->xcp->mem_id to amdgpu bo create parameter mem_id_plus1 to
allocate new svm_bo on the specified memory partition.

This is only for dGPU mode as we don't migrate with APU mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Alloc memory of GPU support memory partition
Philip Yang [Thu, 26 Jan 2023 23:50:09 +0000 (18:50 -0500)]
drm/amdkfd: Alloc memory of GPU support memory partition

For dGPU mode VRAM allocation, create amdgpu_bo from amdgpu_vm->mem_id,
to alloc from the correct memory range.

For APU mode VRAM allocation, set alloc domain to GTT, and set
bp->mem_id_plus1 from amdgpu_vm->mem_id + 1 to create amdgpu_bo, to
allocate system memory from correct NUMA node.

For GTT allocation, use mem_id -1 to allocate system memory from any
NUMA nodes.

Remove amdgpu_ttm_tt_set_mem_pool, to avoid the confusion that memory
maybe allocated from different mem_id.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add memory partition mem_id to amdgpu_bo
Philip Yang [Thu, 26 Jan 2023 23:25:28 +0000 (18:25 -0500)]
drm/amdgpu: Add memory partition mem_id to amdgpu_bo

Add mem_id_plus1 parameter to amdgpu_gem_object_create and pass it to
amdgpu_bo_create. For dGPU mode allocation, mem_id is used by VRAM
manager to get the memory partition fpfn, lpfn from xcp manager. For APU
native mode allocation, mem_id is used to get NUMA node id from xcp
manager, then pass to TTM as numa pool id to alloc memory from the
specific NUMA node. mem_id -1 means for entire VRAM or any NUMA nodes.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Show KFD node memory partition info
Philip Yang [Thu, 26 Jan 2023 23:11:29 +0000 (18:11 -0500)]
drm/amdkfd: Show KFD node memory partition info

Show KFD node memory partition id and size, add helper function
KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used
later to support memory accounting per partition.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add memory partition id to amdgpu_vm
Philip Yang [Fri, 24 Feb 2023 00:58:22 +0000 (19:58 -0500)]
drm/amdgpu: Add memory partition id to amdgpu_vm

If xcp_mgr is initialized, add mem_id to amdgpu_vm structure to store
memory partition number when creating amdgpu_vm for the xcp. The xcp
number is decided when opening the render device, for example
/dev/dri/renderD129 is xcp_id 0, /dev/dri/renderD130 is xcp_id 1.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Store drm node minor number for kfd nodes
Philip Yang [Thu, 23 Feb 2023 16:03:37 +0000 (11:03 -0500)]
drm/amdkfd: Store drm node minor number for kfd nodes

From KFD topology, application will find kfd node with the corresponding
drm device node minor number, for example if partition drm node starts
from /dev/dri/renderD129, then KFD node 0 with store drm node minor
number 129. Application will open drm node /dev/dri/renderD129 to create
amdgpu vm for kfd node 0 with the correct vm->mem_id to indicate the
memory partition.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add xcp manager num_xcp_per_mem_partition
Philip Yang [Sat, 4 Mar 2023 00:45:45 +0000 (19:45 -0500)]
drm/amdgpu: Add xcp manager num_xcp_per_mem_partition

Used by KFD to check memory limit accounting.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: update ref_cnt before ctx free
James Zhu [Mon, 15 Aug 2022 21:21:44 +0000 (17:21 -0400)]
drm/amdgpu: update ref_cnt before ctx free

Update ref_cnt before ctx free.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: run partition schedule if it is supported
James Zhu [Mon, 15 Aug 2022 21:15:02 +0000 (17:15 -0400)]
drm/amdgpu: run partition schedule if it is supported

Run partition schedule if it is supported during ctx init entity.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add partition schedule for GC(9, 4, 3)
James Zhu [Mon, 15 Aug 2022 21:00:54 +0000 (17:00 -0400)]
drm/amdgpu: add partition schedule for GC(9, 4, 3)

Implement partition schedule for GC(9, 4, 3).

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: keep amdgpu_ctx_mgr in ctx structure
James Zhu [Mon, 15 Aug 2022 21:12:01 +0000 (17:12 -0400)]
drm/amdgpu: keep amdgpu_ctx_mgr in ctx structure

Keep amdgpu_ctx_mgr in ctx structure to track fpriv.

v2: add missing fpriv declaration lost in rebase

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add partition scheduler list update
James Zhu [Mon, 15 Aug 2022 21:19:11 +0000 (17:19 -0400)]
drm/amdgpu: add partition scheduler list update

Add partition scheduler list update in late init
and xcp partition mode switch.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: update header to support partition scheduling
James Zhu [Mon, 15 Aug 2022 20:55:02 +0000 (16:55 -0400)]
drm/amdgpu: update header to support partition scheduling

Update header to support partition scheduling.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: add partition ID track in ring
James Zhu [Mon, 15 Aug 2022 20:45:12 +0000 (16:45 -0400)]
drm/amdgpu: add partition ID track in ring

Keep track partition ID in ring.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: find partition ID when open device
James Zhu [Tue, 28 Feb 2023 19:16:38 +0000 (14:16 -0500)]
drm/amdgpu: find partition ID when open device

Find partition ID when open device from render device minor.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: support partition drm devices
James Zhu [Mon, 15 Aug 2022 20:55:02 +0000 (16:55 -0400)]
drm/amdgpu: support partition drm devices

Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3).

This is a temporary solution and will be superceded.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/bu: update mtype_local parameter settings
Graham Sider [Mon, 6 Mar 2023 22:56:44 +0000 (17:56 -0500)]
drm/amdgpu/bu: update mtype_local parameter settings

Update mtype_local module parameter to use MTYPE_RW by default.

0: MTYPE_RW (default)
1: MTYPE_NC
2: MTYPE_CC

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/bu: add mtype_local as a module parameter
David Francis [Mon, 27 Feb 2023 15:33:11 +0000 (10:33 -0500)]
drm/amdgpu/bu: add mtype_local as a module parameter

Selects the MTYPE to be used for local memory,
(0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW)

v2: squash in build fix (Alex)

Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUs
Felix Kuehling [Tue, 21 Feb 2023 22:44:18 +0000 (17:44 -0500)]
drm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUs

On GFXv9.4.3 NUMA APUs, system memory locality must be determined per
page to choose the correct MTYPE. This patch adds a GMC callback that
can provide this per-page override and implements it for native mode.

Carve-out mode is not yet supported and will use the safe default
(remote) MTYPE for system memory.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3
Felix Kuehling [Tue, 21 Feb 2023 22:31:32 +0000 (17:31 -0500)]
drm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3

Treat system memory on NUMA systems as remote by default. Overriding with
a more efficient MTYPE per page will be implemented in the next patch.

No need for a special case for APP APUs. System memory is handled the same
for carve-out and native mode. And VRAM doesn't exist in native mode.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu/bu: Add use_mtype_cc_wa module param
Graham Sider [Mon, 6 Feb 2023 19:04:42 +0000 (14:04 -0500)]
drm/amdgpu/bu: Add use_mtype_cc_wa module param

By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC
instead of MTYPE_RW by default. This is required for the time being to
mitigate a bug causing XCCs to hit stale data due to TCC marking fully
dirty lines as exclusive.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Use legacy TLB flush for gfx943
Graham Sider [Wed, 8 Feb 2023 16:10:57 +0000 (11:10 -0500)]
drm/amdgpu: Use legacy TLB flush for gfx943

Invalidate TLBs via a legacy flush request (flush_type=0) prior to the
heavyweight flush requests (flush_type=2) in gmc_v9_0.c. This is
temporarily required to mitigate a bug causing CPC UTCL1 to return stale
translations after invalidation requests in address range mode.

v2: squash in long term fix "drm/amdgpu: disable extra gfx943 legacy flush on rev1+"

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: For GFX 9.4.3 APU fix vram_usage value
Harish Kasiviswanathan [Fri, 28 Apr 2023 18:20:00 +0000 (14:20 -0400)]
drm/amdgpu: For GFX 9.4.3 APU fix vram_usage value

For GFX 9.4.3 APP APU VRAM is allocated in GTT domain. While freeing
memory check for GTT domain instead of VRAM if it is APP APU

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Enable NPS4 CPX mode
Philip Yang [Wed, 19 Apr 2023 21:43:26 +0000 (17:43 -0400)]
drm/amdgpu: Enable NPS4 CPX mode

CPX compute mode is valid mode for NPS4 memory partition mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Move pgmap to amdgpu_kfd_dev structure
Philip Yang [Fri, 31 Mar 2023 15:13:40 +0000 (11:13 -0400)]
drm/amdkfd: Move pgmap to amdgpu_kfd_dev structure

VRAM pgmap resource is allocated every time when switching compute
partitions because kfd_dev is re-initialized by post_partition_switch,
As a result, it causes memory region resource leaking and system
memory usage accounting unbalanced.

pgmap resource should be allocated and registered only once when loading
driver and freed when unloading driver, move it from kfd_dev to
amdgpu_kfd_dev.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Skip halting RLC on GFX v9.4.3
Lijo Lazar [Fri, 24 Mar 2023 09:51:30 +0000 (15:21 +0530)]
drm/amdgpu: Skip halting RLC on GFX v9.4.3

RLC-PMFW handshake happens periodically when GFXCLK DPM is enabled and
halting RLC may cause unexpected results. Avoid halting RLC from driver
side.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix register accesses in GFX v9.4.3
Lijo Lazar [Thu, 16 Mar 2023 09:29:27 +0000 (14:59 +0530)]
drm/amdgpu: Fix register accesses in GFX v9.4.3

Access registers with the right xcc id. Also, remove the unused logic as
PG is not used in GFX v9.4.3

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Increase queue number per process to 255 on GFX9.4.3
Mukul Joshi [Wed, 15 Mar 2023 18:04:33 +0000 (14:04 -0400)]
drm/amdkfd: Increase queue number per process to 255 on GFX9.4.3

Increase the maximum number of queues that can be created per process
to 255 on GFX 9.4.3. There is no HWS limitation restricting the number
queues that can be created.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Adjust the sequence to query ras error info
Hawking Zhang [Mon, 20 Mar 2023 09:51:30 +0000 (17:51 +0800)]
drm/amdgpu: Adjust the sequence to query ras error info

It turns out STATUS_VALID_FLAG needs to be checked
ahead of any other fields. ADDRESS_VALID_FLAG and
ERR_INFO_VALID_FLAG only manages ADDRESS and ERR_INFO
field respectively. driver should continue poll
ERR CNT field even ERR_INFO_VALD_FLAG is not set.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Initialize jpeg v4_0_3 ras function
Hawking Zhang [Mon, 6 Mar 2023 03:03:27 +0000 (11:03 +0800)]
drm/amdgpu: Initialize jpeg v4_0_3 ras function

Initialize jpeg v4_0_3 ras function.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add reset_ras_error_count for jpeg v4_0_3
Hawking Zhang [Thu, 2 Mar 2023 10:04:24 +0000 (18:04 +0800)]
drm/amdgpu: Add reset_ras_error_count for jpeg v4_0_3

Add reset_ras_error_count callback for jpeg v4_0_3.
It will be used to reset jpeg ras error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add query_ras_error_count for jpeg v4_0_3
Hawking Zhang [Thu, 2 Mar 2023 09:56:59 +0000 (17:56 +0800)]
drm/amdgpu: Add query_ras_error_count for jpeg v4_0_3

Add query_ras_error_count callback for jpeg v4_0_3.
It will be used to query and log jpeg error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Re-enable VCN RAS if DPG is enabled
Hawking Zhang [Thu, 2 Mar 2023 08:38:38 +0000 (16:38 +0800)]
drm/amdgpu: Re-enable VCN RAS if DPG is enabled

VCN RAS enablement sequence needs to be added in
DPG HW init sequence.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Initialize vcn v4_0_3 ras function
Hawking Zhang [Mon, 6 Mar 2023 03:00:11 +0000 (11:00 +0800)]
drm/amdgpu: Initialize vcn v4_0_3 ras function

Initialize vcn v4_0_3 ras function

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add reset_ras_error_count for vcn v4_0_3
Hawking Zhang [Thu, 2 Mar 2023 06:23:47 +0000 (14:23 +0800)]
drm/amdgpu: Add reset_ras_error_count for vcn v4_0_3

Add reset_ras_error_count callback for vcn v4_0_3.
It will be used to reset vcn ras error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add query_ras_error_count for vcn v4_0_3
Hawking Zhang [Wed, 1 Mar 2023 12:37:56 +0000 (20:37 +0800)]
drm/amdgpu: Add query_ras_error_count for vcn v4_0_3

Add query_ras_error_count callback for vcn v4_0_3.
It will be used to query and log vcn error count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add vcn/jpeg ras err status registers
Hawking Zhang [Wed, 1 Mar 2023 02:05:17 +0000 (10:05 +0800)]
drm/amdgpu: Add vcn/jpeg ras err status registers

Add new ras error status registers introduced in
vcn v4_0_3 to log vcn and jpeg ras error.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Checked if the pointer NULL before use it.
Gavin Wan [Fri, 17 Mar 2023 22:42:30 +0000 (18:42 -0400)]
drm/amdgpu: Checked if the pointer NULL before use it.

For SRIOV on some parts, the host driver does not post VBIOS. So the guest
cannot get bios information. Therefore, adev->virt.fw_reserve.p_pf2vf
and adev->mode_info.atom_context are NULL.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Set memory partitions to 1 for SRIOV.
Gavin Wan [Mon, 10 Apr 2023 19:04:26 +0000 (15:04 -0400)]
drm/amdgpu: Set memory partitions to 1 for SRIOV.

For SRIOV, the memory partitions are set on host drover. Each VF only
has one memory partition. We need set the memory partitions to 1 on
guest driver for SRIOV.

V2: sqaush in fix ("drm/amdgpu: Fix memory range info of GC 9.4.3 VFs")

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Acked-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Skip using MC FB Offset when APU flag is set for SRIOV.
Gavin Wan [Mon, 3 Apr 2023 21:49:41 +0000 (17:49 -0400)]
drm/amdgpu: Skip using MC FB Offset when APU flag is set for SRIOV.

The MC_VM_FB_OFFSET is PF only register. It cannot be read on VF.
So, the driver should not use MC_VM_FB_OFFSET address to set the
address of dev->gmc.aper_base.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add PSP supporting PSP 13.0.6 SRIOV ucode init.
Gavin Wan [Thu, 16 Mar 2023 17:44:41 +0000 (13:44 -0400)]
drm/amdgpu: Add PSP supporting PSP 13.0.6 SRIOV ucode init.

Add PSP supporting PSP 13.0.6 SRIOV ucode init.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add PSP spatial parition interface
Lijo Lazar [Fri, 10 Mar 2023 10:08:19 +0000 (15:38 +0530)]
drm/amdgpu: Add PSP spatial parition interface

Add PSP ring command interface for spatial partitioning.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Return error on invalid compute mode
Lijo Lazar [Tue, 7 Mar 2023 05:06:08 +0000 (10:36 +0530)]
drm/amdgpu: Return error on invalid compute mode

Return error if an invalid compute partition mode is requested.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add compute mode descriptor function
Lijo Lazar [Tue, 7 Mar 2023 05:03:05 +0000 (10:33 +0530)]
drm/amdgpu: Add compute mode descriptor function

Keep a helper function to get description of compute partition mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix unmapping of aperture
Lijo Lazar [Fri, 3 Mar 2023 12:33:00 +0000 (18:03 +0530)]
drm/amdgpu: Fix unmapping of aperture

When aperture size is zero, there is no mapping done.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fix xGMI access P2P mapping failure on GFXIP 9.4.3
Rajneesh Bhardwaj [Mon, 27 Feb 2023 18:17:14 +0000 (13:17 -0500)]
drm/amdgpu: Fix xGMI access P2P mapping failure on GFXIP 9.4.3

On GFXIP 9.4.3, we dont need to rely on xGMI hive info to determine P2P
access.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Native mode memory partition support
Rajneesh Bhardwaj [Tue, 28 Feb 2023 01:08:29 +0000 (20:08 -0500)]
drm/amdkfd: Native mode memory partition support

For native mode, after amdgpu_bo is created on CPU domain, then call
amdgpu_ttm_tt_set_mem_pool to select the TTM pool using bo->mem_id.
ttm_bo_validate will allocate the memory to the correct memory partition
before mapping to GPUs.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Set TTM pools for memory partitions
Philip Yang [Mon, 27 Feb 2023 16:16:09 +0000 (11:16 -0500)]
drm/amdgpu: Set TTM pools for memory partitions

For native mode only, create TTM pool for each memory partition to store
the NUMA node id, then the TTM pool will be selected using memory
partition id to allocate memory from the correct partition.

Acked-by: Christian König <christian.koenig@amd.com>
(rajneesh: changed need_swiotlb and need_dma32 to false for pool init)
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/ttm: export ttm_pool_fini for cleanup
Rajneesh Bhardwaj [Tue, 14 Feb 2023 04:51:07 +0000 (23:51 -0500)]
drm/ttm: export ttm_pool_fini for cleanup

ttm_pool_init is exported and used outside of ttm subsystem with
amdgpu_ttm interface, similarly export ttm_pool_fini for proper cleanup.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add auto mode for compute partition
Lijo Lazar [Mon, 13 Feb 2023 13:56:18 +0000 (19:26 +0530)]
drm/amdgpu: Add auto mode for compute partition

When auto mode is specified, driver will choose the right compute
partition mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Check memory ranges for valid xcp mode
Lijo Lazar [Mon, 13 Feb 2023 13:20:07 +0000 (18:50 +0530)]
drm/amdgpu: Check memory ranges for valid xcp mode

Check the memory ranges available to the device also for deciding a
valid partition mode. Only select combinations are valid for a
particular mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Use xcc mask for identifying xcc
Lijo Lazar [Thu, 9 Feb 2023 11:00:53 +0000 (16:30 +0530)]
drm/amdkfd: Use xcc mask for identifying xcc

Instead of start xcc id and number of xcc per node, use the xcc mask
which is the mask of logical ids of xccs belonging to a parition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdkfd: Add xcp reference to kfd node
Lijo Lazar [Thu, 9 Feb 2023 09:14:13 +0000 (14:44 +0530)]
drm/amdkfd: Add xcp reference to kfd node

Fetch xcp information from xcp_mgr and also add xcc_mask to kfd node.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Move initialization of xcp before kfd
Lijo Lazar [Fri, 3 Feb 2023 13:16:40 +0000 (18:46 +0530)]
drm/amdgpu: Move initialization of xcp before kfd

After partition switch, fill all relevant xcp information before kfd
starts initialization.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Fill xcp mem node in aquavanjaram
Lijo Lazar [Fri, 3 Feb 2023 11:44:12 +0000 (17:14 +0530)]
drm/amdgpu: Fill xcp mem node in aquavanjaram

Implement callbacks to fill memory node information in aquavanjaram.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add callback to fill xcp memory id
Lijo Lazar [Fri, 3 Feb 2023 11:42:10 +0000 (17:12 +0530)]
drm/amdgpu: Add callback to fill xcp memory id

Add callback in xcp interface to fill xcp memory id information. Memory
id is used to identify the range/partition of an XCP from the available
memory partitions in device. Also, fill the id information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Initialize memory ranges for GC 9.4.3
Lijo Lazar [Tue, 14 Feb 2023 09:15:45 +0000 (14:45 +0530)]
drm/amdgpu: Initialize memory ranges for GC 9.4.3

GC 9.4.3 ASICS may have memory split into multiple partitions.Initialize
the memory partition information for each range. The information may be
in the form of a numa node id or a range of pages.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add memory partitions to gmc
Lijo Lazar [Tue, 14 Feb 2023 09:07:53 +0000 (14:37 +0530)]
drm/amdgpu: Add memory partitions to gmc

Some ASICs have the device memory divided into multiple partitions. The
parititions could be denoted by a numa node or by a range of pages.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add API to get numa information of XCC
Lijo Lazar [Tue, 14 Feb 2023 13:29:40 +0000 (18:59 +0530)]
drm/amdgpu: Add API to get numa information of XCC

Add interface to get numa information of ACPI XCC object. The interface
uses logical id to identify an XCC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Store additional numa node information
Lijo Lazar [Tue, 14 Feb 2023 13:03:51 +0000 (18:33 +0530)]
drm/amdgpu: Store additional numa node information

Use a struct to store additional numa node information including size
and base address. Add numa_info pointer to xcc object to point to the
relevant structure based on its proximity domain.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Get supported memory partition modes
Lijo Lazar [Fri, 17 Feb 2023 04:02:44 +0000 (09:32 +0530)]
drm/amdgpu: Get supported memory partition modes

Expand the interface to get supported memory partition modes also along
with the current memory partition mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Move memory partition query to gmc
Lijo Lazar [Tue, 31 Jan 2023 07:09:49 +0000 (12:39 +0530)]
drm/amdgpu: Move memory partition query to gmc

GMC block handles memory related information, it makes more sense to
keep memory partition functions in gmc block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Add utility functions for xcp
Lijo Lazar [Wed, 25 Jan 2023 14:34:52 +0000 (20:04 +0530)]
drm/amdgpu: Add utility functions for xcp

Add utility functions to get details of xcp and iterate through
available xcps.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Use apt name for FW reserved region
Lijo Lazar [Fri, 24 Feb 2023 12:31:38 +0000 (18:01 +0530)]
drm/amdgpu: Use apt name for FW reserved region

Use the generic term fw_reserved_memory for FW reserve region. This
region may also hold discovery TMR in addition to other reserve
regions. This region size could be larger than discovery tmr size, hence
don't change the discovery tmr size based on this.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Use GPU VA space for IH v4.4.2 in APU
Lijo Lazar [Thu, 23 Feb 2023 14:43:56 +0000 (20:13 +0530)]
drm/amdgpu: Use GPU VA space for IH v4.4.2 in APU

For IH ring buffer and read/write pointers, use GPU VA space rather than
Guest PA on APU configs. Access through Guest PA doesn't work when IOMMU
is enabled. It is also beneficial in NUMA configs as it allocates from
the closest numa pool in a numa enabled system.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 years agodrm/amdgpu: Simplify aquavanjram instance mapping
Lijo Lazar [Mon, 20 Feb 2023 06:34:30 +0000 (12:04 +0530)]
drm/amdgpu: Simplify aquavanjram instance mapping

Simplify so as to use the same sequence to assign logical to physical
ids for all IPs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>