x86/asm: Optimize memcpy_flushcache()
authorMikulas Patocka <mpatocka@redhat.com>
Wed, 8 Aug 2018 21:22:16 +0000 (17:22 -0400)
committerIngo Molnar <mingo@kernel.org>
Mon, 10 Sep 2018 13:17:12 +0000 (15:17 +0200)
commit02101c45ec5b19d607af7372680f5259050b4e9c
treed2b817c7d7490aa655d9a3c4a8193c5c47fd60ad
parent11da3a7f84f19c26da6f86af878298694ede0804
x86/asm: Optimize memcpy_flushcache()

I use memcpy_flushcache() in my persistent memory driver for metadata
updates, there are many 8-byte and 16-byte updates and it turns out that
the overhead of memcpy_flushcache causes 2% performance degradation
compared to "movnti" instruction explicitly coded using inline assembler.

The tests were done on a Skylake processor with persistent memory emulated
using the "memmap" kernel parameter. dd was used to copy data to the
dm-writecache target.

This patch recognizes memcpy_flushcache calls with constant short length
and turns them into inline assembler - so that I don't have to use inline
assembler in the driver.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: device-mapper development <dm-devel@redhat.com>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1808081720460.24747@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch/x86/include/asm/string_64.h
arch/x86/lib/usercopy_64.c