bpf: Improve program stats run-time calculation
authorJose Fernandez <josef@netflix.com>
Tue, 2 Apr 2024 03:40:10 +0000 (21:40 -0600)
committerDaniel Borkmann <daniel@iogearbox.net>
Tue, 2 Apr 2024 14:51:15 +0000 (16:51 +0200)
This patch improves the run-time calculation for program stats by
capturing the duration as soon as possible after the program returns.

Previously, the duration included u64_stats_t operations. While the
instrumentation overhead is part of the total time spent when stats are
enabled, distinguishing between the program's native execution time and
the time spent due to instrumentation is crucial for accurate
performance analysis.

By making this change, the patch facilitates more precise optimization
of BPF programs, enabling users to understand their performance in
environments without stats enabled.

I used a virtualized environment to measure the run-time over one minute
for a basic raw_tracepoint/sys_enter program, which just increments a
local counter. Although the virtualization introduced some performance
degradation that could affect the results, I observed approximately a
16% decrease in average run-time reported by stats with this change
(310 -> 260 nsec).

Signed-off-by: Jose Fernandez <josef@netflix.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240402034010.25060-1-josef@netflix.com
include/linux/filter.h
kernel/bpf/trampoline.c

index 44934b968b57d9793ed321eb66bc98a91f9ce19f..531b360901223e35b6f20a150d837e6a8580e4d9 100644 (file)
@@ -654,14 +654,16 @@ static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
        cant_migrate();
        if (static_branch_unlikely(&bpf_stats_enabled_key)) {
                struct bpf_prog_stats *stats;
-               u64 start = sched_clock();
+               u64 duration, start = sched_clock();
                unsigned long flags;
 
                ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
+
+               duration = sched_clock() - start;
                stats = this_cpu_ptr(prog->stats);
                flags = u64_stats_update_begin_irqsave(&stats->syncp);
                u64_stats_inc(&stats->cnt);
-               u64_stats_add(&stats->nsecs, sched_clock() - start);
+               u64_stats_add(&stats->nsecs, duration);
                u64_stats_update_end_irqrestore(&stats->syncp, flags);
        } else {
                ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
index cc50607f8d8cadc76ee41ce88e4575400d89f455..26ae703d3c3bbc6f6d398dcf0a46a5e8c2cd4eea 100644 (file)
@@ -885,12 +885,13 @@ static void notrace update_prog_stats(struct bpf_prog *prog,
             * Hence check that 'start' is valid.
             */
            start > NO_START_TIME) {
+               u64 duration = sched_clock() - start;
                unsigned long flags;
 
                stats = this_cpu_ptr(prog->stats);
                flags = u64_stats_update_begin_irqsave(&stats->syncp);
                u64_stats_inc(&stats->cnt);
-               u64_stats_add(&stats->nsecs, sched_clock() - start);
+               u64_stats_add(&stats->nsecs, duration);
                u64_stats_update_end_irqrestore(&stats->syncp, flags);
        }
 }