llvm patch [1] enabled cross-function optimization for func arguments
(ArgumentPromotion) at -O2 level. And this caused s390 sock_fields
test failure ([2]). The failure is gone right now as patch [1] was
reverted in [3]. But it is possible that patch [3] will be reverted
again and then the test failure in [2] will show up again. So it is
desirable to fix the failure regardless.
The following is an analysis why sock_field test fails with
llvm patch [1].
The main problem is in
static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk)
{
__u32 *word = (__u32 *)&sk->dst_port;
return word[0] == bpf_htons(0xcafe);
}
static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
{
__u16 *half = (__u16 *)&sk->dst_port;
return half[0] == bpf_htons(0xcafe);
}
...
int read_sk_dst_port(struct __sk_buff *skb)
{
...
sk = skb->sk;
...
if (!sk_dst_port__load_word(sk))
RET_LOG();
if (!sk_dst_port__load_half(sk))
RET_LOG();
...
}
Through some cross-function optimization by ArgumentPromotion
optimization, the compiler does:
static __noinline bool sk_dst_port__load_word(__u32 word_val)
{
return word_val == bpf_htons(0xcafe);
}
static __noinline bool sk_dst_port__load_half(__u16 half_val)
{
return half_val == bpf_htons(0xcafe);
}
...
int read_sk_dst_port(struct __sk_buff *skb)
{
...
sk = skb->sk;
...
__u32 *word = (__u32 *)&sk->dst_port;
__u32 word_val = word[0];
...
if (!sk_dst_port__load_word(word_val))
RET_LOG();
__u16 half_val = word_val >> 16;
if (!sk_dst_port__load_half(half_val))
RET_LOG();
...
}
In current uapi bpf.h, we have
struct bpf_sock {
...
__be16 dst_port; /* network byte order */
__u16 :16; /* zero padding */
...
};
But the old kernel (e.g., 5.6) we have
struct bpf_sock {
...
__u32 dst_port; /* network byte order */
...
};
So for backward compatability reason, 4-byte load of
dst_port is converted to 2-byte load internally.
Specifically, 'word_val = word[0]' is replaced by 2-byte load
by the verifier and this caused the trouble for later
sk_dst_port__load_half() where half_val becomes 0.
Typical usr program won't have such a code pattern tiggering
the above bug, so let us fix the test failure with source
code change. Adding an empty asm volatile statement seems
enough to prevent undesired transformation.
[1] https://reviews.llvm.org/
D148269
[2] https://lore.kernel.org/bpf/
e7f2c5e8-a50c-198d-8f95-
388165f1e4fd@meta.com/
[3] https://reviews.llvm.org/rG141be5c062ecf22bd287afffd310e8ac4711444a
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230516214945.1013578-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>