BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH"
should be the same as atomic_fetch_add(), which is currently not the
case on s390x: the serialization instruction "bcr 14,0" is missing.
This applies to "and", "or" and "xor" variants too.
s390x is allowed to reorder stores with subsequent fetches from
different addresses, so code relying on BPF_FETCH acting as a barrier,
for example:
  stw [%r0], 1
  afadd [%r1], %r2
  ldxw %r3, [%r4]
may be broken. Fix it by emitting "bcr 14,0".
Note that a separate serialization instruction is not needed for
BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs
serialization itself.
Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops")
Reported-by: Puranjay Mohan <puranjay12@gmail.com>
Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
        EMIT6_DISP_LH(0xeb000000, is32 ? (op32) : (op64),               \
                      (insn->imm & BPF_FETCH) ? src_reg : REG_W0,       \
                      src_reg, dst_reg, off);                           \
-       if (is32 && (insn->imm & BPF_FETCH))                            \
-               EMIT_ZERO(src_reg);                                     \
+       if (insn->imm & BPF_FETCH) {                                    \
+               /* bcr 14,0 - see atomic_fetch_{add,and,or,xor}() */    \
+               _EMIT2(0x07e0);                                         \
+               if (is32)                                               \
+                       EMIT_ZERO(src_reg);                             \
+       }                                                               \
 } while (0)
                case BPF_ADD:
                case BPF_ADD | BPF_FETCH: