Optimize replaceShortcodeTokens
We can of course skip reading the entire byte slice again and again.
This was a slip in the original implementation; functionally the same,
but is slightly faster, esp. for larger data sets with many shortcodes:
```
benchmark old ns/op new ns/op delta
BenchmarkReplaceShortcodeTokens-4 15505 14753 -4.85%
benchmark old allocs new allocs delta
BenchmarkReplaceShortcodeTokens-4 1 1 +0.00%
benchmark old bytes new bytes delta
BenchmarkReplaceShortcodeTokens-4 3072 3072 +0.00%
```