drm/panfrost: Add fdinfo support GPU load metrics
authorAdrián Larumbe <adrian.larumbe@collabora.com>
Fri, 29 Sep 2023 18:14:28 +0000 (19:14 +0100)
committerBoris Brezillon <boris.brezillon@collabora.com>
Wed, 4 Oct 2023 11:04:15 +0000 (13:04 +0200)
The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Also add a documentation file explaining the possible values for fdinfo's
engine keystrings and Panfrost-specific drm-curfreq-<keystr> pairs.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230929181616.2769345-3-adrian.larumbe@collabora.com
15 files changed:
Documentation/gpu/drm-usage-stats.rst
Documentation/gpu/panfrost.rst [new file with mode: 0644]
MAINTAINERS
drivers/gpu/drm/panfrost/Makefile
drivers/gpu/drm/panfrost/panfrost_debugfs.c [new file with mode: 0644]
drivers/gpu/drm/panfrost/panfrost_debugfs.h [new file with mode: 0644]
drivers/gpu/drm/panfrost/panfrost_devfreq.c
drivers/gpu/drm/panfrost/panfrost_devfreq.h
drivers/gpu/drm/panfrost/panfrost_device.c
drivers/gpu/drm/panfrost/panfrost_device.h
drivers/gpu/drm/panfrost/panfrost_drv.c
drivers/gpu/drm/panfrost/panfrost_gpu.c
drivers/gpu/drm/panfrost/panfrost_gpu.h
drivers/gpu/drm/panfrost/panfrost_job.c
drivers/gpu/drm/panfrost/panfrost_job.h

index fe35a291ff3e0938a2ed809a77bdc405e29a052d..8d963cd7c1b7accba54071d05a71ee184ee0ad29 100644 (file)
@@ -169,3 +169,4 @@ Driver specific implementations
 -------------------------------
 
 :ref:`i915-usage-stats`
+:ref:`panfrost-usage-stats`
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
new file mode 100644 (file)
index 0000000..b80e41f
--- /dev/null
@@ -0,0 +1,40 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=========================
+ drm/Panfrost Mali Driver
+=========================
+
+.. _panfrost-usage-stats:
+
+Panfrost DRM client usage stats implementation
+==============================================
+
+The drm/Panfrost driver implements the DRM client usage stats specification as
+documented in :ref:`drm-client-usage-stats`.
+
+Example of the output showing the implemented key value pairs and entirety of
+the currently possible format options:
+
+::
+      pos:    0
+      flags:  02400002
+      mnt_id: 27
+      ino:    531
+      drm-driver:     panfrost
+      drm-client-id:  14
+      drm-engine-fragment:    1846584880 ns
+      drm-cycles-fragment:    1424359409
+      drm-maxfreq-fragment:   799999987 Hz
+      drm-curfreq-fragment:   799999987 Hz
+      drm-engine-vertex-tiler:        71932239 ns
+      drm-cycles-vertex-tiler:        52617357
+      drm-maxfreq-vertex-tiler:       799999987 Hz
+      drm-curfreq-vertex-tiler:       799999987 Hz
+      drm-total-memory:       290 MiB
+      drm-shared-memory:      0 MiB
+      drm-active-memory:      226 MiB
+      drm-resident-memory:    36496 KiB
+      drm-purgeable-memory:   128 KiB
+
+Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
+`drm-curfreq-` values convey the current operating frequency for that engine.
index d787a18a76c46c71b27e859358619a410cc0b2e4..46ca5c4affdb3c6dece797fe87a10dbb69b34259 100644 (file)
@@ -1620,6 +1620,7 @@ R:        Steven Price <steven.price@arm.com>
 L:     dri-devel@lists.freedesktop.org
 S:     Supported
 T:     git git://anongit.freedesktop.org/drm/drm-misc
+F:     Documentation/gpu/panfrost.rst
 F:     drivers/gpu/drm/panfrost/
 F:     include/uapi/drm/panfrost_drm.h
 
index 7da2b3f02ed90c526cc2770210736d67bc0cf5fc..2c01c1e7523e9fd894e5947c761438cc7a46546c 100644 (file)
@@ -12,4 +12,6 @@ panfrost-y := \
        panfrost_perfcnt.o \
        panfrost_dump.o
 
+panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
+
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
new file mode 100644 (file)
index 0000000..72d4286
--- /dev/null
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2023 Collabora ltd. */
+/* Copyright 2023 Amazon.com, Inc. or its affiliates. */
+
+#include <linux/debugfs.h>
+#include <linux/platform_device.h>
+#include <drm/drm_debugfs.h>
+#include <drm/drm_file.h>
+#include <drm/panfrost_drm.h>
+
+#include "panfrost_device.h"
+#include "panfrost_gpu.h"
+#include "panfrost_debugfs.h"
+
+void panfrost_debugfs_init(struct drm_minor *minor)
+{
+       struct drm_device *dev = minor->dev;
+       struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev));
+
+       debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, &pfdev->profile_mode);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
new file mode 100644 (file)
index 0000000..c5af5f3
--- /dev/null
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2023 Collabora ltd.
+ * Copyright 2023 Amazon.com, Inc. or its affiliates.
+ */
+
+#ifndef PANFROST_DEBUGFS_H
+#define PANFROST_DEBUGFS_H
+
+#ifdef CONFIG_DEBUG_FS
+void panfrost_debugfs_init(struct drm_minor *minor);
+#endif
+
+#endif  /* PANFROST_DEBUGFS_H */
index 58dfb15a8757f11598819a003efef8e463c84e9f..28caffc689e216dd1d8b22b7711b2940a5b32cd5 100644 (file)
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
        spin_lock_irqsave(&pfdevfreq->lock, irqflags);
 
        panfrost_devfreq_update_utilization(pfdevfreq);
+       pfdevfreq->current_frequency = status->current_frequency;
 
        status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
                                                   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
        struct devfreq *devfreq;
        struct thermal_cooling_device *cooling;
        struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
+       unsigned long freq = ULONG_MAX;
 
        if (pfdev->comp->num_supplies > 1) {
                /*
@@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
                return ret;
        }
 
+       /* Find the fastest defined rate  */
+       opp = dev_pm_opp_find_freq_floor(dev, &freq);
+       if (IS_ERR(opp))
+               return PTR_ERR(opp);
+       pfdevfreq->fast_rate = freq;
+
        dev_pm_opp_put(opp);
 
        /*
index 1514c1f9d91c64bbc58b580e3638ee94d51144fa..48dbe185f2061f4ac17d12968326c61a851b9217 100644 (file)
@@ -19,6 +19,9 @@ struct panfrost_devfreq {
        struct devfreq_simple_ondemand_data gov_data;
        bool opp_of_table_added;
 
+       unsigned long current_frequency;
+       unsigned long fast_rate;
+
        ktime_t busy_time;
        ktime_t idle_time;
        ktime_t time_last_update;
index fa1a086a862b6ef2b61e934d006237aa2996095c..28f7046e1b1a4904a90fea52ec35fb6e80ec3a3e 100644 (file)
@@ -207,6 +207,8 @@ int panfrost_device_init(struct panfrost_device *pfdev)
 
        spin_lock_init(&pfdev->as_lock);
 
+       spin_lock_init(&pfdev->cycle_counter.lock);
+
        err = panfrost_clk_init(pfdev);
        if (err) {
                dev_err(pfdev->dev, "clk init failed %d\n", err);
index b0126b9fbadcaf220ef2c368121b8b05da6ea06e..1e85656dc2f7fe71e57d35ab64b672aa3ebd6883 100644 (file)
@@ -107,6 +107,7 @@ struct panfrost_device {
        struct list_head scheduled_jobs;
 
        struct panfrost_perfcnt *perfcnt;
+       atomic_t profile_mode;
 
        struct mutex sched_lock;
 
@@ -121,6 +122,11 @@ struct panfrost_device {
        struct shrinker shrinker;
 
        struct panfrost_devfreq pfdevfreq;
+
+       struct {
+               atomic_t use_count;
+               spinlock_t lock;
+       } cycle_counter;
 };
 
 struct panfrost_mmu {
@@ -135,12 +141,19 @@ struct panfrost_mmu {
        struct list_head list;
 };
 
+struct panfrost_engine_usage {
+       unsigned long long elapsed_ns[NUM_JOB_SLOTS];
+       unsigned long long cycles[NUM_JOB_SLOTS];
+};
+
 struct panfrost_file_priv {
        struct panfrost_device *pfdev;
 
        struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
 
        struct panfrost_mmu *mmu;
+
+       struct panfrost_engine_usage engine_usage;
 };
 
 static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
index a2ab99698ca80a6bc59e34bc704c6df6e2fbc4e6..97e5bc4a82c82de396e8e388dacec18b543e44a6 100644 (file)
@@ -20,6 +20,7 @@
 #include "panfrost_job.h"
 #include "panfrost_gpu.h"
 #include "panfrost_perfcnt.h"
+#include "panfrost_debugfs.h"
 
 static bool unstable_ioctls;
 module_param_unsafe(unstable_ioctls, bool, 0600);
@@ -267,6 +268,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
        job->requirements = args->requirements;
        job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
        job->mmu = file_priv->mmu;
+       job->engine_usage = &file_priv->engine_usage;
 
        slot = panfrost_job_get_slot(job);
 
@@ -523,7 +525,56 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
        PANFROST_IOCTL(MADVISE,         madvise,        DRM_RENDER_ALLOW),
 };
 
-DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
+static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
+                                    struct panfrost_file_priv *panfrost_priv,
+                                    struct drm_printer *p)
+{
+       int i;
+
+       /*
+        * IMPORTANT NOTE: drm-cycles and drm-engine measurements are not
+        * accurate, as they only provide a rough estimation of the number of
+        * GPU cycles and CPU time spent in a given context. This is due to two
+        * different factors:
+        * - Firstly, we must consider the time the CPU and then the kernel
+        *   takes to process the GPU interrupt, which means additional time and
+        *   GPU cycles will be added in excess to the real figure.
+        * - Secondly, the pipelining done by the Job Manager (2 job slots per
+        *   engine) implies there is no way to know exactly how much time each
+        *   job spent on the GPU.
+        */
+
+       static const char * const engine_names[] = {
+               "fragment", "vertex-tiler", "compute-only"
+       };
+
+       BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
+
+       for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+               drm_printf(p, "drm-engine-%s:\t%llu ns\n",
+                          engine_names[i], panfrost_priv->engine_usage.elapsed_ns[i]);
+               drm_printf(p, "drm-cycles-%s:\t%llu\n",
+                          engine_names[i], panfrost_priv->engine_usage.cycles[i]);
+               drm_printf(p, "drm-maxfreq-%s:\t%lu Hz\n",
+                          engine_names[i], pfdev->pfdevfreq.fast_rate);
+               drm_printf(p, "drm-curfreq-%s:\t%lu Hz\n",
+                          engine_names[i], pfdev->pfdevfreq.current_frequency);
+       }
+}
+
+static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+       struct drm_device *dev = file->minor->dev;
+       struct panfrost_device *pfdev = dev->dev_private;
+
+       panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+}
+
+static const struct file_operations panfrost_drm_driver_fops = {
+       .owner = THIS_MODULE,
+       DRM_GEM_FOPS,
+       .show_fdinfo = drm_show_fdinfo,
+};
 
 /*
  * Panfrost driver version:
@@ -535,6 +586,7 @@ static const struct drm_driver panfrost_drm_driver = {
        .driver_features        = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
        .open                   = panfrost_open,
        .postclose              = panfrost_postclose,
+       .show_fdinfo            = panfrost_show_fdinfo,
        .ioctls                 = panfrost_drm_driver_ioctls,
        .num_ioctls             = ARRAY_SIZE(panfrost_drm_driver_ioctls),
        .fops                   = &panfrost_drm_driver_fops,
@@ -546,6 +598,10 @@ static const struct drm_driver panfrost_drm_driver = {
 
        .gem_create_object      = panfrost_gem_create_object,
        .gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
+
+#ifdef CONFIG_DEBUG_FS
+       .debugfs_init           = panfrost_debugfs_init,
+#endif
 };
 
 static int panfrost_probe(struct platform_device *pdev)
index 2faa344d89ee1479aefeccba57b84ae494703dfb..f0be7e19b13ed932b23051a045a025cba8ab0999 100644 (file)
@@ -73,6 +73,13 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
        gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
        gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);
 
+       /*
+        * All in-flight jobs should have released their cycle
+        * counter references upon reset, but let us make sure
+        */
+       if (drm_WARN_ON(pfdev->ddev, atomic_read(&pfdev->cycle_counter.use_count) != 0))
+               atomic_set(&pfdev->cycle_counter.use_count, 0);
+
        return 0;
 }
 
@@ -321,6 +328,40 @@ static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
                 pfdev->features.shader_present, pfdev->features.l2_present);
 }
 
+void panfrost_cycle_counter_get(struct panfrost_device *pfdev)
+{
+       if (atomic_inc_not_zero(&pfdev->cycle_counter.use_count))
+               return;
+
+       spin_lock(&pfdev->cycle_counter.lock);
+       if (atomic_inc_return(&pfdev->cycle_counter.use_count) == 1)
+               gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
+       spin_unlock(&pfdev->cycle_counter.lock);
+}
+
+void panfrost_cycle_counter_put(struct panfrost_device *pfdev)
+{
+       if (atomic_add_unless(&pfdev->cycle_counter.use_count, -1, 1))
+               return;
+
+       spin_lock(&pfdev->cycle_counter.lock);
+       if (atomic_dec_return(&pfdev->cycle_counter.use_count) == 0)
+               gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
+       spin_unlock(&pfdev->cycle_counter.lock);
+}
+
+unsigned long long panfrost_cycle_counter_read(struct panfrost_device *pfdev)
+{
+       u32 hi, lo;
+
+       do {
+               hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
+               lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
+       } while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));
+
+       return ((u64)hi << 32) | lo;
+}
+
 void panfrost_gpu_power_on(struct panfrost_device *pfdev)
 {
        int ret;
index 468c51e7e46db53618e2ca8da7fc2f8ff0a680fd..876fdad9f7214bbc72c7971c22c4699d7e1f0e1e 100644 (file)
@@ -16,6 +16,10 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev);
 void panfrost_gpu_power_on(struct panfrost_device *pfdev);
 void panfrost_gpu_power_off(struct panfrost_device *pfdev);
 
+void panfrost_cycle_counter_get(struct panfrost_device *pfdev);
+void panfrost_cycle_counter_put(struct panfrost_device *pfdev);
+unsigned long long panfrost_cycle_counter_read(struct panfrost_device *pfdev);
+
 void panfrost_gpu_amlogic_quirk(struct panfrost_device *pfdev);
 
 #endif
index 033f5e684707d869ff0e11d0f92b4ade8be1688d..fb16de2d0420a506e990d75c5f4cc405b42b5fc3 100644 (file)
@@ -159,6 +159,16 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
        struct panfrost_job *job = pfdev->jobs[slot][0];
 
        WARN_ON(!job);
+       if (job->is_profiled) {
+               if (job->engine_usage) {
+                       job->engine_usage->elapsed_ns[slot] +=
+                               ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
+                       job->engine_usage->cycles[slot] +=
+                               panfrost_cycle_counter_read(pfdev) - job->start_cycles;
+               }
+               panfrost_cycle_counter_put(job->pfdev);
+       }
+
        pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
        pfdev->jobs[slot][1] = NULL;
 
@@ -233,6 +243,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
        subslot = panfrost_enqueue_job(pfdev, js, job);
        /* Don't queue the job if a reset is in progress */
        if (!atomic_read(&pfdev->reset.pending)) {
+               if (atomic_read(&pfdev->profile_mode)) {
+                       panfrost_cycle_counter_get(pfdev);
+                       job->is_profiled = true;
+                       job->start_time = ktime_get();
+                       job->start_cycles = panfrost_cycle_counter_read(pfdev);
+               }
+
                job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
                dev_dbg(pfdev->dev,
                        "JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
@@ -660,10 +677,14 @@ panfrost_reset(struct panfrost_device *pfdev,
         * stuck jobs. Let's make sure the PM counters stay balanced by
         * manually calling pm_runtime_put_noidle() and
         * panfrost_devfreq_record_idle() for each stuck job.
+        * Let's also make sure the cycle counting register's refcnt is
+        * kept balanced to prevent it from running forever
         */
        spin_lock(&pfdev->js->job_lock);
        for (i = 0; i < NUM_JOB_SLOTS; i++) {
                for (j = 0; j < ARRAY_SIZE(pfdev->jobs[0]) && pfdev->jobs[i][j]; j++) {
+                       if (pfdev->jobs[i][j]->is_profiled)
+                               panfrost_cycle_counter_put(pfdev->jobs[i][j]->pfdev);
                        pm_runtime_put_noidle(pfdev->dev);
                        panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
                }
@@ -926,6 +947,9 @@ void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
                        }
 
                        job_write(pfdev, JS_COMMAND(i), cmd);
+
+                       /* Jobs can outlive their file context */
+                       job->engine_usage = NULL;
                }
        }
        spin_unlock(&pfdev->js->job_lock);
index 8becc1ba0eb95210850fc9b1b49f20d39bcc8ba1..17ff808dba078f4522c7dd03081c1bed8800aa28 100644 (file)
@@ -32,6 +32,11 @@ struct panfrost_job {
 
        /* Fence to be signaled by drm-sched once its done with the job */
        struct dma_fence *render_done_fence;
+
+       struct panfrost_engine_usage *engine_usage;
+       bool is_profiled;
+       ktime_t start_time;
+       u64 start_cycles;
 };
 
 int panfrost_job_init(struct panfrost_device *pfdev);