From 36148c2bbfbe50c50206b6f61d072203c80161e0 Mon Sep 17 00:00:00 2001
From: =?utf8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke@toke.dk>
Date: Fri, 2 Feb 2018 16:11:05 +0100
Subject: [PATCH] mac80211: Adjust TSQ pacing shift
MIME-Version: 1.0
Content-Type: text/plain; charset=utf8
Content-Transfer-Encoding: 8bit

Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.

Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):

sk_pacing_shift 10:  43.21 Mbps (pre-patch)
sk_pacing_shift  9:  78.17 Mbps
sk_pacing_shift  8: 123.94 Mbps
sk_pacing_shift  7: 128.31 Mbps

Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/tx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 25904af38839a..69722504e3e14 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 	if (!IS_ERR_OR_NULL(sta)) {
 		struct ieee80211_fast_tx *fast_tx;
 
+		/* We need a bit of data queued to build aggregates properly, so
+		 * instruct the TCP stack to allow more than a single ms of data
+		 * to be queued in the stack. The value is a bit-shift of 1
+		 * second, so 8 is ~4ms of queued data. Only affects local TCP
+		 * sockets.
+		 */
+		sk_pacing_shift_update(skb->sk, 8);
+
 		fast_tx = rcu_dereference(sta->fast_tx);
 
 		if (fast_tx &&
-- 
2.30.2