Saaem Rizvi [Thu, 11 May 2023 19:16:35 +0000 (15:16 -0400)]
 
drm/amd/display: Trigger DIO FIFO resync on commit streams for DCN32
[WHY and HOW]
Currently, on DCN32 we have an old workaround to resolve a DIO FIFO
speed issue when writing to the OTG DIVIDER register. However, this
workaround is not safe as we should be applying the DIO FIFO rampup
logic when the OTG re disabled along with the encoders. This new
workaround accounts for this. If the workaround sequence is incorrect,
like it is was, there is a chance we might hang. this new
workaround was first implemented in DCN314.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Saaem Rizvi <syedsaaem.rizvi@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cruise Hung [Fri, 12 May 2023 15:33:46 +0000 (23:33 +0800)]
 
drm/amd/display: Update correct DCN314 register header
[Why]
The register header for DCN314 is not correct.
[How]
Update correct DCN314 register header.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Cruise Hung <cruise.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Qingqing Zhuo [Thu, 16 Mar 2023 13:05:58 +0000 (09:05 -0400)]
 
drm/amd/display: Clean FPGA code in dc
[Why]
Drop dead code for Linux.
[How]
Remove all IS_FPGA_MAXIMUS_DC and IS_DIAG_DC
Reviewed-by: Ariel Bernstein <eric.bernstein@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alvin Lee [Thu, 11 May 2023 19:25:01 +0000 (15:25 -0400)]
 
drm/amd/display: Apply 60us prefetch for DCFCLK <= 300Mhz
[Description]
- Previously we wanted to apply extra 60us of prefetch for min DCFCLK
  (200Mhz), but DCFCLK can be calculated to be 201Mhz which underflows
  also without the extra prefetch
- Instead, apply the the extra 60us prefetch for any DCFCLK freq <=
  300Mhz
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Daniel Miess [Thu, 11 May 2023 14:51:27 +0000 (10:51 -0400)]
 
drm/amd/display: Fix possible underflow for displays with large vblank
[Why]
Underflow observed when using a display with a large vblank region
and low refresh rate
[How]
Simplify calculation of vblank_nom
Increase value for VBlankNomDefaultUS to 800us
Fixed a null pointer from previous commit of this change
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Daniel Miess [Thu, 11 May 2023 13:12:09 +0000 (09:12 -0400)]
 
drm/amd/display: Revert vblank change that causes null pointer crash
Revert commit 
1a4bcdbea431 ("drm/amd/display: Fix possible underflow for displays with large vblank")
Because it cause some regression
Fixes: 1a4bcdbea431 ("drm/amd/display: Fix possible underflow for displays with large vblank")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Saaem Rizvi [Tue, 9 May 2023 18:41:59 +0000 (14:41 -0400)]
 
drm/amd/display: Trigger DIO FIFO resync on commit streams
[WHY]
Currently, there is an intermittent issue where a screen can either go
blank or be corrupted.
[HOW]
To resolve the issue we trigger the ramping logic for DIO FIFO so that
it goes back up to the correct speed.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Saaem Rizvi <syedsaaem.rizvi@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Wed, 10 May 2023 20:43:30 +0000 (16:43 -0400)]
 
drm/amd/display: Have Payload Properly Created After Resume
At drm suspend sequence, MST dc_sink is removed. When commit cached
MST stream back in drm resume sequence, the MST stream payload is not
properly created and added into the payload table. After resume, topology
change is reprobed by removing existing streams first. That leads to
no payload is found in the existing payload table as below error
"[drm] ERROR No payload for [MST PORT:] found in mst state"
1. In encoder .atomic_check routine, remove check existance of dc_sink
2. Bypass MST by checking existence of MST root port. dc_link_type cannot
differentiate MST port before topology is rediscovered.
Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dmytro Laktyushkin [Wed, 10 May 2023 19:27:19 +0000 (15:27 -0400)]
 
drm/amd/display: fix dcn315 pixel rate crb scaling check
fix dcn315 pixel rate crb scaling check error
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hersen Wu [Mon, 24 Apr 2023 15:39:20 +0000 (11:39 -0400)]
 
drm/amd/display: lower dp link training message level
[Why] some test apps report dp link training waring even dp
training pass. there are 4 tries of lt within
perform_link_training_with_retries. if lt pass within 4 tries,
it will NOT be reated as lt failure. for each try of lt, if lt
fails, current driver implementation prints message at warning
level. this let people think dp lt does not work properly.
[How] for 1st, 2nd and 3rd try of lt, print message at debug
level. for the 4th try of lt, print message at warning level.
Reviewed-by: Jerry Zuo <jerry.zuo@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alan Liu [Tue, 2 May 2023 09:54:50 +0000 (17:54 +0800)]
 
drm/amd/display: Fix warning in disabling vblank irq
[Why]
During gpu-reset, we toggle vblank irq by calling dc_interrupt_set()
instead of amdgpu_irq_get/put() because we don't want to change the irq
source's refcount. However, we see the warning when vblank irq is enabled
by dc_interrupt_set() during gpu-reset but disabled by amdgpu_irq_put()
after gpu-reset.
[How]
Only in dm_gpureset_toggle_interrupts() we toggle vblank interrupts by
calling dc_interrupt_set(). Apart from this we call dm_set_vblank()
which uses amdgpu_irq_get/put() to operate vblank irq.
Reviewed-by: Bhawanpreet Lakha <bhawanpreet.lakha@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alan Liu <haoping.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Mon, 8 May 2023 20:08:37 +0000 (16:08 -0400)]
 
drm/amd/display: Update SR watermarks for DCN314
[Why & How]
Update parameters for SR watermarks for DCN314
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dmytro Laktyushkin [Tue, 9 May 2023 13:38:29 +0000 (09:38 -0400)]
 
drm/amd/display: disable dcn315 pixel rate crb when scaling
The rough calculation does not account for scaling. Also, make 2
segments the minimum allowed per surface to avoid potential 0 detile
with mpc/odm combine on such outputs.
Reviewed-by: Ariel Bernstein <eric.bernstein@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cruise Hung [Tue, 9 May 2023 13:36:35 +0000 (21:36 +0800)]
 
drm/amd/display: Fix DMUB debugging print issue
[Why]
The DMUB diagnostic data was not printed out correctly.
[How]
Print the DMUB diagnostic data line by line.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Cruise Hung <cruise.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christoph Hellwig [Thu, 18 May 2023 13:51:38 +0000 (15:51 +0200)]
 
drm/amdgpu: stop including swiotlb.h
amdgpu does not need swiotlb.h, so stop including it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 18 May 2023 16:38:22 +0000 (12:38 -0400)]
 
drm/radeon: reintroduce radeon_dp_work_func content
Put back the radeon_dp_work_func logic.  It seems that
handling DP RX interrupts is necessary to make some
panels work.  This was removed with the MST support,
but it regresses some systems so add it back.  While
we are here, add the proper mutex locking.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2567
Fixes: 01ad1d9c2888 ("drm/radeon: Drop legacy MST support")
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lyude Paul <lyude@redhat.com>
Srinivasan Shanmugam [Fri, 19 May 2023 05:14:41 +0000 (10:44 +0530)]
 
drm/amdgpu: Fix uninitalized variable in kgd2kfd_device_init
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:613:4: error: variable 'num_xcd' is uninitialized when used here [-Werror,-Wuninitialized]
                        num_xcd, kfd->adev->gfx.num_xcc_per_xcp);
                        ^~~~~~~
include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
        dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
                                                                       ^~~~~~~~~~~
include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
                _p_func(dev, fmt, ##__VA_ARGS__);                       \
                                    ^~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:597:13: note: initialize the variable 'num_xcd' to silence this warning
        int num_xcd, partition_mode;
                   ^
                    = 0
1 error generated.
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Fri, 19 May 2023 05:04:50 +0000 (10:34 +0530)]
 
drm/amdgpu: Fix uninitalized variable in jpeg_v4_0_3_is_idle & jpeg_v4_0_3_wait_for_idle
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:752:4: error: variable 'ret' is uninitialized when used here [-Werror,-Wuninitialized]
                        ret &= ((RREG32_SOC15_OFFSET(
                        ^~~
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:745:10: note: initialize the variable 'ret' to silence this warning
        bool ret;
                ^
                 = 0
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:774:4: error: variable 'ret' is uninitialized when used here [-Werror,-Wuninitialized]
                        ret &= SOC15_WAIT_ON_RREG_OFFSET(
                        ^~~
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:767:9: note: initialize the variable 'ret' to silence this warning
        int ret;
               ^
                = 0
2 errors generated.
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: James Zhu <James.Zhu@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Fri, 19 May 2023 10:54:55 +0000 (16:24 +0530)]
 
drm/amd/amdgpu: Fix errors & warnings in mmhub_v1_8.c
Fix below errors & warnings reported by checkpatch:
ERROR: code indent should use tabs where possible
WARNING: please, no space before tabs
WARNING: please, no spaces at the start of a line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
ERROR: space prohibited before that '++' (ctx:WxB)
WARNING: Block comments use a trailing */ on a separate line
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Likun Gao [Fri, 19 May 2023 10:54:18 +0000 (18:54 +0800)]
 
drm/amdgpu: retire set_vga_state for some ASIC
set_vga_state operation only allowed on SI generation
ASIC, retire the realted function on those ASIC which
did not do anything.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Fri, 5 May 2023 17:16:32 +0000 (13:16 -0400)]
 
drm/amd/display: improve the message printed when loading DC
[Why&How]
Change how DC version and hardware version is printed when driver is
loaded.
- Remove exclamation
- Add DC version and hardware version to both success and failure cases
- Add version in between appropriate filler words to make a complete
  statement.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Likun Gao [Fri, 19 May 2023 08:01:57 +0000 (16:01 +0800)]
 
drm/amdgpu: fix vga_set_state NULL pointer issue
Fix NULL pointer issue for vga_set_state function
as not all the ASIC need this operation.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Fri, 19 May 2023 06:06:57 +0000 (11:36 +0530)]
 
drm/amdgpu: Fix uninitialized variable in gfx_v9_4_3_cp_resume
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1925:6: error: variable 'r' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
        if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1931:6: note: uninitialized use occurs here
        if (r)
            ^
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1925:2: note: remove the 'if' if its condition is always true
        if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1923:7: note: initialize the variable 'r' to silence this warning
        int r, i, num_xcc;
             ^
              = 0
1 error generated.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Horatio Zhang [Tue, 16 May 2023 03:02:10 +0000 (23:02 -0400)]
 
drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v4_0
Add ras_poison_irq and functions. And fix the amdgpu_irq_put
call trace in jpeg_v4_0_hw_fini.
[   50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
[   50.497619] RSP: 0018:
ffffaa2400fcfcb0 EFLAGS: 
00010246
[   50.497620] RAX: 
0000000000000000 RBX: 
0000000000000001 RCX: 
0000000000000000
[   50.497621] RDX: 
0000000000000000 RSI: 
0000000000000000 RDI: 
0000000000000000
[   50.497621] RBP: 
ffffaa2400fcfcd0 R08: 
0000000000000000 R09: 
0000000000000000
[   50.497622] R10: 
0000000000000000 R11: 
0000000000000000 R12: 
ffff99b2105242d8
[   50.497622] R13: 
0000000000000000 R14: 
ffff99b210500000 R15: 
ffff99b210500000
[   50.497623] FS:  
0000000000000000(0000) GS:
ffff99b518480000(0000) knlGS:
0000000000000000
[   50.497623] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[   50.497624] CR2: 
00007f9d32aa91e8 CR3: 
00000001ba210000 CR4: 
0000000000750ee0
[   50.497624] PKRU: 
55555554
[   50.497625] Call Trace:
[   50.497625]  <TASK>
[   50.497627]  jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu]
[   50.497693]  jpeg_v4_0_suspend+0x13/0x30 [amdgpu]
[   50.497751]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
[   50.497802]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
[   50.497854]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
[   50.497905]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
[   50.498005]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
[   50.498060]  process_one_work+0x21f/0x400
[   50.498063]  worker_thread+0x200/0x3f0
[   50.498064]  ? process_one_work+0x400/0x400
[   50.498065]  kthread+0xee/0x120
[   50.498067]  ? kthread_complete_and_exit+0x20/0x20
[   50.498068]  ret_from_fork+0x22/0x30
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Horatio Zhang [Tue, 16 May 2023 02:59:54 +0000 (22:59 -0400)]
 
drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v2_6
Add ras_poison_irq and functions.
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Horatio Zhang [Tue, 16 May 2023 02:57:19 +0000 (22:57 -0400)]
 
drm/amdgpu: separate ras irq from jpeg instance irq for UVD_POISON
Separate jpegbRAS poison consumption handling from the instance irq, and
register dedicated ras_poison_irq src and funcs for UVD_POISON.
v2:
- Separate ras irq from jpeg instance irq
- Improve the subject and code comments
v3:
- Split the patch into three parts
- Improve the code comments
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Horatio Zhang [Tue, 16 May 2023 02:09:43 +0000 (22:09 -0400)]
 
drm/amdgpu: add RAS POISON interrupt funcs for vcn_v4_0
Add ras_poison_irq and functions. And fix the amdgpu_irq_put
call trace in vcn_v4_0_hw_fini.
[   44.563572] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
[   44.563629] RSP: 0018:
ffffb36740edfc90 EFLAGS: 
00010246
[   44.563630] RAX: 
0000000000000000 RBX: 
0000000000000001 RCX: 
0000000000000000
[   44.563630] RDX: 
0000000000000000 RSI: 
0000000000000000 RDI: 
0000000000000000
[   44.563631] RBP: 
ffffb36740edfcb0 R08: 
0000000000000000 R09: 
0000000000000000
[   44.563631] R10: 
0000000000000000 R11: 
0000000000000000 R12: 
ffff954c568e2ea8
[   44.563631] R13: 
0000000000000000 R14: 
ffff954c568c0000 R15: 
ffff954c568e2ea8
[   44.563632] FS:  
0000000000000000(0000) GS:
ffff954f584c0000(0000) knlGS:
0000000000000000
[   44.563632] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[   44.563633] CR2: 
00007f028741ba70 CR3: 
000000026ca10000 CR4: 
0000000000750ee0
[   44.563633] PKRU: 
55555554
[   44.563633] Call Trace:
[   44.563634]  <TASK>
[   44.563634]  vcn_v4_0_hw_fini+0x62/0x160 [amdgpu]
[   44.563700]  vcn_v4_0_suspend+0x13/0x30 [amdgpu]
[   44.563755]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
[   44.563806]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
[   44.563858]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
[   44.563909]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
[   44.564006]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
[   44.564061]  process_one_work+0x21f/0x400
[   44.564062]  worker_thread+0x200/0x3f0
[   44.564063]  ? process_one_work+0x400/0x400
[   44.564064]  kthread+0xee/0x120
[   44.564065]  ? kthread_complete_and_exit+0x20/0x20
[   44.564066]  ret_from_fork+0x22/0x30
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Horatio Zhang [Tue, 16 May 2023 01:52:00 +0000 (21:52 -0400)]
 
drm/amdgpu: add RAS POISON interrupt funcs for vcn_v2_6
Add ras_poison_irq and functions.
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Horatio Zhang [Tue, 16 May 2023 01:45:51 +0000 (21:45 -0400)]
 
drm/amdgpu: separate ras irq from vcn instance irq for UVD_POISON
Separate vcn RAS poison consumption handling from the instance irq, and
register dedicated ras_poison_irq src and funcs for UVD_POISON.
v2:
- Separate ras irq from vcn instance irq
- Improve the subject and code comments
v3:
- Split the patch into three parts
- Improve the code comments
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
YuanShang [Mon, 8 May 2023 04:12:28 +0000 (12:12 +0800)]
 
drm/amdgpu: Remove IMU ucode in vf2pf
The IMU firmware is loaded on the host side, not the guest.
Remove IMU in vf2pf ucode id enum.
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-By: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shiwu Zhang [Wed, 17 May 2023 05:21:15 +0000 (13:21 +0800)]
 
drm/amdgpu: fix the memory override in kiq ring struct
This is introduced by the code merge and will let the
adev->gfx.kiq[0].ring struct being overrided
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shiwu Zhang [Wed, 17 May 2023 05:49:08 +0000 (13:49 +0800)]
 
drm/amdgpu: add the smu_v13_0_6 and gfx_v9_4_3 ip block
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse Zhang [Thu, 18 May 2023 01:46:22 +0000 (09:46 +0800)]
 
drm/amdgpu: don't enable secure display on incompatible platforms
[why]
[drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
[drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
amdgpu 0000:04:00.0: amdgpu: Secure display: Generic Failure.
[how]
don't enable secure display on incompatible platforms
Suggested-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Jesse zhang <jesse.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Su Hui [Wed, 17 May 2023 02:52:19 +0000 (10:52 +0800)]
 
drm/radeon: Remove unnecessary (void*) conversions
No need cast (void*) to (struct radeon_device *)
or (struct radeon_ring *).
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sukrut Bellary [Wed, 3 May 2023 23:15:07 +0000 (16:15 -0700)]
 
drm:amd:amdgpu: Fix missing buffer object unlock in failure path
smatch warning -
1) drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:3615 gfx_v9_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.
2) drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:6901 gfx_v10_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.
Signed-off-by: Sukrut Bellary <sukrut.bellary@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Wed, 17 May 2023 18:39:46 +0000 (14:39 -0400)]
 
drm/amd/display: Fix artifacting on eDP panels when engaging freesync video mode
[Why]
When freesync video mode is enabled, switching resolution from native
mode to one of the freesync video compatible modes can trigger continous
artifacts on some eDP panels when running under KDE. The articating can be seen in the
attached bug report.
[How]
Fix this by restricting updates that require full commit by using the same checks
for stream and scaling changes in the the enable pass of dm_update_crtc_state()
along with the check for compatible timings for freesync vide mode.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162
Fixes: da5e14909776 ("drm/amd/display: Fix hang when skipping modeset")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bas Nieuwenhuizen [Sat, 13 May 2023 12:51:00 +0000 (14:51 +0200)]
 
drm/amdgpu: Validate VM ioctl flags.
None have been defined yet, so reject anybody setting any. Mesa sets
it to 0 anyway.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Mario Limonciello [Thu, 18 May 2023 14:40:09 +0000 (09:40 -0500)]
 
drm/amd: Update driver-misc.html for Rembrandt-R
AMD has added marketing information publicly for Rembrandt-R, so
update the APU table with matching versions.
Link: https://www.amd.com/en/product/13086
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Su Hui [Mon, 15 May 2023 01:34:28 +0000 (09:34 +0800)]
 
drm/amdgpu: remove unnecessary (void*) conversions
No need cast (void*) to (struct amdgpu_device *).
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 18 May 2023 14:35:17 +0000 (09:35 -0500)]
 
drm/amd: Update driver-misc.html for Dragon Range
AMD has added marketing information publicly for Dragon Range, so
update the APU table with matching versions.
Link: https://www.amd.com/en/product/13016
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 18 May 2023 14:30:26 +0000 (09:30 -0500)]
 
drm/amd: Update driver-misc.html for Phoenix
AMD has added marketing information publicly for Phoenix, so
update the APU table with matching versions.
Link: https://www.amd.com/en/product/13036
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tong Liu01 [Tue, 16 May 2023 06:50:04 +0000 (14:50 +0800)]
 
drm/amdgpu: fix incorrect pcie_gen_mask in passthrough case
[why]
Passthrough case is treated as root bus and pcie_gen_mask is set as
default value that does not support GEN 3 and GEN 4 for PCIe link
speed. So PCIe link speed will be downgraded at smu hw init in
passthrough condition
[how]
Move get pci info after detect virtualization and check if it is
passthrough case when set pcie_gen_mask
Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hamza Mahfooz [Wed, 17 May 2023 18:09:39 +0000 (14:09 -0400)]
 
drm/amd/display: drop unused count variable in create_eml_sink()
Since, we are only interested in having
drm_edid_override_connector_update(), update the value of
connector->edid_blob_ptr. We don't care about the return value of
drm_edid_override_connector_update() here. So, drop count.
Fixes: 550e5d23f147 ("drm/amd/display: assign edid_blob_ptr with edid from debugfs")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hamza Mahfooz [Wed, 17 May 2023 17:49:29 +0000 (13:49 -0400)]
 
drm/amd/display: drop unused function set_abm_event()
set_abm_event() is never actually used. So, drop it.
Fixes: b8fe56375f78 ("drm/amd/display: Refactor ABM feature")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Tom Rix <trix@redhat.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hamza Mahfooz [Wed, 17 May 2023 17:31:45 +0000 (13:31 -0400)]
 
drm/amd/display: drop redundant memset() in get_available_dsc_slices()
get_available_dsc_slices() returns the number of indices set, and all of
the users of get_available_dsc_slices() don't cross the returned bound
when iterating over available_slices[]. So, the memset() in
get_available_dsc_slices() is redundant and can be dropped.
Fixes: 97bda0322b8a ("drm/amd/display: Add DSC support for Navi (v2)")
Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jack Xiao [Wed, 17 May 2023 09:07:01 +0000 (17:07 +0800)]
 
drm/amdgpu: fix S3 issue if MQD in VRAM
1. Need flush HDP for MQD putting in vram
2. Zero out mes MQD
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Thu, 18 May 2023 02:22:06 +0000 (07:52 +0530)]
 
drm/amdgpu: Fix warnings in amdgpu _sdma, _ucode.c
Fix below checkpatch warnings:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Comparisons should place the constant on the right side of the test
WARNING: Missing a blank line after declarations
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 17 May 2023 15:54:43 +0000 (21:24 +0530)]
 
drm/amd/amdgpu: Fix errors & warnings in amdgpu _uvd, _vce.c
Fix below checkpatch errors & warnings:
In amdgpu_uvd.c:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Prefer 'unsigned int *' to bare use of 'unsigned *'
WARNING: Missing a blank line after declarations
WARNING: %Lx is non-standard C, use %llx
ERROR: space required before the open parenthesis '('
ERROR: space required before the open brace '{'
WARNING: %LX is non-standard C, use %llX
WARNING: Block comments use * on subsequent lines
+/* multiple fence commands without any stream commands in between can
+   crash the vcpu so just try to emmit a dummy create/destroy msg to
WARNING: Block comments use a trailing */ on a separate line
+   avoid this */
WARNING: braces {} are not necessary for single statement blocks
+               for (j = 0; j < adev->uvd.num_enc_rings; ++j) {
+                       fences += amdgpu_fence_count_emitted(&adev->uvd.inst[i].ring_enc[j]);
+               }
In amdgpu_vce.c:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations
WARNING: %Lx is non-standard C, use %llx
WARNING: Possible repeated word: 'we'
ERROR: space required before the open parenthesis '('
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
YiPeng Chai [Tue, 16 May 2023 09:34:17 +0000 (17:34 +0800)]
 
drm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3
perform mode2 reset for sdma fed error on gfx v11_0_3.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 17 May 2023 15:01:02 +0000 (20:31 +0530)]
 
drm/amd/amdgpu: Fix errors & warnings in amdgpu_vcn.c
Fix below checkpatch insisted error & warnings:
ERROR: space required before the open brace '{'
WARNING: braces {} are not necessary for any arm of this statement
+       if ((type == VCN_ENCODE_RING) && (vcn_config & VCN_BLOCK_ENCODE_DISABLE_MASK)) {
[...]
+       } else if ((type == VCN_DECODE_RING) && (vcn_config & VCN_BLOCK_DECODE_DISABLE_MASK)) {
[...]
+       } else if ((type == VCN_UNIFIED_RING) && (vcn_config & VCN_BLOCK_QUEUE_DISABLE_MASK)) {
[...]
ERROR: code indent should use tabs where possible
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
+               for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
+                       fence[j] += amdgpu_fence_count_emitted(&adev->vcn.inst[j].ring_enc[i]);
+
ERROR: space required before the open parenthesis '('
WARNING: Missing a blank line after declarations
WARNING: please, no spaces at the start of a line
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 17 May 2023 13:08:00 +0000 (18:38 +0530)]
 
drm/amd/amdgpu: Fix warnings in amdgpu_encoders.c
Fix below checkpatch warnings:
WARNING: Missing a blank line after declarations
+                       struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
+                       amdgpu_encoder->active_device = amdgpu_encoder->devices & amdgpu_connector->devices;
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 17 May 2023 19:04:35 +0000 (15:04 -0400)]
 
drm/amdkfd: fix stack size in svm_range_validate_and_map
Allocate large local variable on heap to avoid exceeding the
stack size:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function ‘svm_range_validate_and_map’:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1690:1: warning: the frame size of 2360 bytes is larger than 2048 bytes [-Wframe-larger-than=]
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 17 May 2023 14:40:48 +0000 (20:10 +0530)]
 
drm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.c
Fix below checkpatch insisted error & warnings:
ERROR: Macros with complex values should be enclosed in parentheses
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
WARNING: Block comments use a trailing */ on a separate line
WARNING: Missing a blank line after declarations
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 16 May 2023 20:56:49 +0000 (16:56 -0400)]
 
drm/amdgpu/vcn4: fix endian conversion
sq.is_enabled is a byte so there is no need to endian swap it.
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 16 May 2023 21:16:30 +0000 (17:16 -0400)]
 
drm/amdgpu/gmc9: fix 64 bit division in partition code
Rework logic or use do_div() to avoid problems on 32 bit.
v2: add a missing case for XCP macro
v3: fix out of bounds array access
v4: fix xcp handling harder
Acked-by: Guchun Chen <guchun.chen@amd.com> (v1)
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Wed, 8 Feb 2023 09:05:08 +0000 (17:05 +0800)]
 
drm/amdgpu: initialize RAS for gfx_v9_4_3
Register GFX RAS functions and initialize GFX RAS.
v2: remove xcp operations.
v3: reuse the return value of gfx_ras_sw_init.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Fri, 10 Feb 2023 10:41:28 +0000 (18:41 +0800)]
 
drm/amdgpu: add sq timeout status functions for gfx_v9_4_3
Query and reset sq timeout status.
v2: change instance from 0 to xcc_id for register access.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Wed, 8 Feb 2023 06:54:01 +0000 (14:54 +0800)]
 
drm/amdgpu: add RAS error count reset for gfx_v9_4_3
Add GFX RAS error count reset function.
v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Fri, 17 Mar 2023 09:13:46 +0000 (17:13 +0800)]
 
drm/amdgpu: add RAS error count query for gfx_v9_4_3
Query GFX RAS ce/ue count.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 6 Feb 2023 03:38:19 +0000 (11:38 +0800)]
 
drm/amdgpu: add RAS error count definitions for gfx_v9_4_3
Prepare for the query of GFX RAS ce/ue count.
v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add more CE/UE registsers to query list.
    add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.
v4: move gfx memory id definitions to gfx_v9_4_3.
v5: create a dedicated patch for adding error count query function.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Tue, 7 Feb 2023 10:30:55 +0000 (18:30 +0800)]
 
drm/amdgpu: add RAS definitions for GFX
Add common GFX RAS definitions.
v2: remove instance from amdgpu_gfx_ras_reg_entry,
    amdgpu_ras_err_status_reg_entry has already defined it.
v3: remove memory id definitions from amdgpu_gfx.h, they are
    related to IP version.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 27 Feb 2023 09:36:19 +0000 (17:36 +0800)]
 
drm/amdgpu: Add gc v9_4_3 ras error status registers
GC v9_4_3 introduces UE|CE_ERR_STATUS_LO|HI to log
hardware errors
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Fri, 3 Feb 2023 02:41:26 +0000 (10:41 +0800)]
 
drm/amdgpu: add RAS status reset for gfx_v9_4_3
Reset GFX RAS status registers.
v2: fix typo in title.
    remove xcp operation.
v3: change instance from 0 to xcc_id for register access.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Thu, 2 Feb 2023 09:20:23 +0000 (17:20 +0800)]
 
drm/amdgpu: add RAS status query for gfx_v9_4_3
Query GFX RAS status.
v2: remove xcp operation.
v3: change instance from 0 to xcc_id for register access.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Thu, 2 Feb 2023 10:57:04 +0000 (18:57 +0800)]
 
drm/amdgpu: add GFX RAS common function
The common function can help reduce redundant code.
v2: remove xcp operation, only need to do RAS operations for all
instances.
v3: remove check for GFX RAS support, will be checked in higher level.
    add amdgpu prefix for the function name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Fri, 12 May 2023 05:22:57 +0000 (13:22 +0800)]
 
drm/amdgpu: Do not access members of xcp w/o check (v2)
Not all the asic needs xcp. ensure check xcp availabity
before accessing its member.
v2: add missing change in kfd_topology.c
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Thu, 11 May 2023 09:01:03 +0000 (17:01 +0800)]
 
drm/amdkfd: Fix null ptr access
Avoid access null xcp_mgr pointer.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 20 Mar 2023 10:21:14 +0000 (18:21 +0800)]
 
drm/amdgpu: add check for RAS instance mask
The mask is only needed to be set when RAS block instance number is
more than 1 and invalid bits should be also masked out.
We only check valid bits for GFX and SDMA block for now, and will
add check for other RAS blocks in the future.
v2: move the check under injection operation since the mask is only
    used by RAS error inject.
v3: add valid bits handling for SDMA.
v4: print message if the mask is adjusted.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 13 Mar 2023 08:34:19 +0000 (16:34 +0800)]
 
drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2
No special requirement in RAS injection for the two versions, switch to
use default injection interface.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 13 Mar 2023 08:24:11 +0000 (16:24 +0800)]
 
drm/amdgpu: reorganize RAS injection flow
So GFX RAS injection could use default function if it doesn't define its
own injection interface.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 27 Feb 2023 10:25:23 +0000 (18:25 +0800)]
 
drm/amdgpu: add instance mask for RAS inject
User can specify injected instances by the mask. For backward
compatibility, the mask value is incorporated into sub block index
without interface change of RAS TA.
User uses logical mask and driver should convert it to physical value
before sending it to RAS TA.
v2: update parameter name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 27 Feb 2023 08:31:56 +0000 (16:31 +0800)]
 
drm/amdgpu: convert logical instance mask to physical one
Convert instance mask for the convenience of RAS TA.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mukul Joshi [Wed, 12 Apr 2023 20:56:29 +0000 (16:56 -0400)]
 
drm/amdgpu: Enable IH CAM on GFX9.4.3
This patch enables IH CAM on GFX9.4.3 ASIC.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 19 Apr 2023 21:39:35 +0000 (17:39 -0400)]
 
drm/amdgpu: Correct get_xcp_mem_id calculation
Current calculation only works for NPS4/QPX mode, correct it for
NPS4/CPX mode.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 31 Mar 2023 15:18:12 +0000 (11:18 -0400)]
 
drm/amdkfd: Refactor migrate init to support partition switch
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
only once in amdgpu_device_ip_init after adev ip blocks are initialized,
but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
SVM support based on pgmap.
svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
switching compute partition mode.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shiwu Zhang [Fri, 31 Mar 2023 09:16:41 +0000 (17:16 +0800)]
 
drm/amdgpu: route ioctls on primary node of XCPs to primary device
During XCP init, unlike the primary device, there is no amdgpu_device
attached to each XCP's drm_device
In case that user trying to open/close the primary node of XCP drm_device
this rerouting is to solve the NULL pointer issue causing by referring
to any member of the amdgpu_device
 BUG: unable to handle page fault for address: 
0000000000020c80
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 Call Trace:
  <TASK>
  lock_timer_base+0x6b/0x90
  try_to_del_timer_sync+0x2b/0x80
  del_timer_sync+0x29/0x40
  flush_delayed_work+0x1c/0x50
  amdgpu_driver_open_kms+0x2c/0x280 [amdgpu]
  drm_file_alloc+0x1b3/0x260 [drm]
  drm_open+0xaa/0x280 [drm]
  drm_stub_open+0xa2/0x120 [drm]
  chrdev_open+0xa6/0x1c0
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 23 Mar 2023 12:45:56 +0000 (08:45 -0400)]
 
drm/amdkfd: APU mode set max svm range pages
svm_migrate_init set the max svm range pages based on the KFD nodes
partition size. APU mode don't init pgmap because there is no migration.
kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation
and initialization.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mukul Joshi [Mon, 20 Mar 2023 15:22:30 +0000 (11:22 -0400)]
 
drm/amdkfd: Fix memory reporting on GFX 9.4.3
This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU
by reporting available memory on a per partition basis. If its an
APU, available and used memory calculations take into account
system and TTM memory.
v2: squash in fix ("drm/amdkfd: Fix array out of bound warning")
    squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mukul Joshi [Mon, 20 Mar 2023 15:21:38 +0000 (11:21 -0400)]
 
drm/amdkfd: Move local_mem_info to kfd_node
We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.
v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 13 Mar 2023 16:03:18 +0000 (12:03 -0400)]
 
drm/amdgpu: use xcp partition ID for amdgpu_gem
Find xcp_id from amdgpu_fpriv, use it for amdgpu_gem_object_create.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 10 Mar 2023 00:30:02 +0000 (19:30 -0500)]
 
drm/amdgpu: KFD graphics interop support compute partition
kfd_ioctl_get_dmabuf use the amdgpu bo xcp_id to get the gpu_id of the
KFD node from the exported dmabuf_adev, and then create kfd bo on the
correct adev and KFD node when importing the amdgpu bo to KFD.
Remove function kfd_device_by_adev, it is not needed as it is the same
result as dmabuf_adev->kfd.dev->nodes[0]->id.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 8 Mar 2023 16:57:00 +0000 (11:57 -0500)]
 
drm/amdkfd: Store xcp partition id to amdgpu bo
For memory accounting per compute partition and export drm amdgpu bo and
then import to KFD, we need the xcp id to account the memory usage or
find the KFD node of the original amdgpu bo to create the KFD bo on the
correct adev KFD node.
Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to
amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id.
v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 7 Mar 2023 16:30:24 +0000 (11:30 -0500)]
 
drm/amdgpu: dGPU mode set VRAM range lpfn as exclusive
TTM place lpfn is exclusive used as end (start + size) in drm and buddy
allocator, adev->gmc memory partition range lpfn is inclusive (start +
size - 1), should plus 1 to set TTM place lpfn.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 24 Feb 2023 01:00:05 +0000 (20:00 -0500)]
 
drm/amdgpu: Alloc page table on correct memory partition
Alloc kernel mode page table bo uses the amdgpu_vm->mem_id + 1 as bp
mem_id_plus1 parameter. For APU mode, select the correct TTM pool to
alloc page from the corresponding memory partition, this will be the
closest NUMA node. For dGPU mode, select the correct address range for
vram manager.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 2 Feb 2023 16:07:53 +0000 (11:07 -0500)]
 
drm/amdkfd: Update MTYPE for far memory partition
Use MTYPE RW/MTYPE_CC for mapping system memory or VRAM to KFD node
within the same memory partition, use MTYPE_NC for mapping on KFD node
from the far memory partition of the same socket or from another socket
on same XGMI hive.
On NPS4 or 4P system, MTYPE will be overridden per page depending on
the memory NUMA node id and vm->mem_id.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 26 Jan 2023 23:54:29 +0000 (18:54 -0500)]
 
drm/amdgpu: dGPU mode placement support memory partition
dGPU mode uses VRAM manager to validate bo, amdgpu bo placement use the
mem_id  to get the allocation range first, last page frame number
from xcp manager, pass to drm buddy allocator as the allowed range.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 26 Jan 2023 23:45:32 +0000 (18:45 -0500)]
 
drm/amdkfd: SVM range allocation support memory partition
Pass kfd node->xcp->mem_id to amdgpu bo create parameter mem_id_plus1 to
allocate new svm_bo on the specified memory partition.
This is only for dGPU mode as we don't migrate with APU mode.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 26 Jan 2023 23:50:09 +0000 (18:50 -0500)]
 
drm/amdkfd: Alloc memory of GPU support memory partition
For dGPU mode VRAM allocation, create amdgpu_bo from amdgpu_vm->mem_id,
to alloc from the correct memory range.
For APU mode VRAM allocation, set alloc domain to GTT, and set
bp->mem_id_plus1 from amdgpu_vm->mem_id + 1 to create amdgpu_bo, to
allocate system memory from correct NUMA node.
For GTT allocation, use mem_id -1 to allocate system memory from any
NUMA nodes.
Remove amdgpu_ttm_tt_set_mem_pool, to avoid the confusion that memory
maybe allocated from different mem_id.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 26 Jan 2023 23:25:28 +0000 (18:25 -0500)]
 
drm/amdgpu: Add memory partition mem_id to amdgpu_bo
Add mem_id_plus1 parameter to amdgpu_gem_object_create and pass it to
amdgpu_bo_create. For dGPU mode allocation, mem_id is used by VRAM
manager to get the memory partition fpfn, lpfn from xcp manager. For APU
native mode allocation, mem_id is used to get NUMA node id from xcp
manager, then pass to TTM as numa pool id to alloc memory from the
specific NUMA node. mem_id -1 means for entire VRAM or any NUMA nodes.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 26 Jan 2023 23:11:29 +0000 (18:11 -0500)]
 
drm/amdkfd: Show KFD node memory partition info
Show KFD node memory partition id and size, add helper function
KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used
later to support memory accounting per partition.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 24 Feb 2023 00:58:22 +0000 (19:58 -0500)]
 
drm/amdgpu: Add memory partition id to amdgpu_vm
If xcp_mgr is initialized, add mem_id to amdgpu_vm structure to store
memory partition number when creating amdgpu_vm for the xcp. The xcp
number is decided when opening the render device, for example
/dev/dri/renderD129 is xcp_id 0, /dev/dri/renderD130 is xcp_id 1.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 23 Feb 2023 16:03:37 +0000 (11:03 -0500)]
 
drm/amdkfd: Store drm node minor number for kfd nodes
From KFD topology, application will find kfd node with the corresponding
drm device node minor number, for example if partition drm node starts
from /dev/dri/renderD129, then KFD node 0 with store drm node minor
number 129. Application will open drm node /dev/dri/renderD129 to create
amdgpu vm for kfd node 0 with the correct vm->mem_id to indicate the
memory partition.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Sat, 4 Mar 2023 00:45:45 +0000 (19:45 -0500)]
 
drm/amdgpu: Add xcp manager num_xcp_per_mem_partition
Used by KFD to check memory limit accounting.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 21:21:44 +0000 (17:21 -0400)]
 
drm/amdgpu: update ref_cnt before ctx free
Update ref_cnt before ctx free.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 21:15:02 +0000 (17:15 -0400)]
 
drm/amdgpu: run partition schedule if it is supported
Run partition schedule if it is supported during ctx init entity.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 21:00:54 +0000 (17:00 -0400)]
 
drm/amdgpu: add partition schedule for GC(9, 4, 3)
Implement partition schedule for GC(9, 4, 3).
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 21:12:01 +0000 (17:12 -0400)]
 
drm/amdgpu: keep amdgpu_ctx_mgr in ctx structure
Keep amdgpu_ctx_mgr in ctx structure to track fpriv.
v2: add missing fpriv declaration lost in rebase
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 21:19:11 +0000 (17:19 -0400)]
 
drm/amdgpu: add partition scheduler list update
Add partition scheduler list update in late init
and xcp partition mode switch.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 20:55:02 +0000 (16:55 -0400)]
 
drm/amdgpu: update header to support partition scheduling
Update header to support partition scheduling.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Mon, 15 Aug 2022 20:45:12 +0000 (16:45 -0400)]
 
drm/amdgpu: add partition ID track in ring
Keep track partition ID in ring.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>