btrfs: do not use replace target device as an extra mirror
[BUG]
Currently btrfs can use dev-replace device as an extra mirror for
read-repair. But it can lead to NODATASUM corruption in the following
case:
There is a RAID1 data chunk, and dev-replace is running from
dev2 to dev0.
|//| = Replaced data
X X+1MB X+2MB
Dev 2: | | | <- Source dev
Dev 0: |///////| | <- Target dev
Then a read on dev 2 X+2MB happens.
And something wrong happened inside devid 2, causing an -EIO.
In that case, read-repair would try the next mirror, and since we can
use target device as an extra mirror, we will use that mirror instead.
But unfortunately since the read is beyond the current replace cursor,
we should not trust it at all, what we get would be just uninitialized
garbage.
But if this read is for NODATASUM range, then we just trust them and
cause data corruption.
[CAUSE]
We used to have some checks to make sure we only return such extra
mirror when the range is before our left cursor.
The first commit introducing this behavior is
ad6d620e2a57 ("Btrfs:
allow repair code to include target disk when searching mirrors").
But later a fix,
22ab04e814f4 ("Btrfs: fix race between device replace
and chunk allocation") changed the behavior, to always let
btrfs_map_block() include the extra mirror to address a race in
dev-replace which can cause missing writes to target device.
This means, we lose the tracking of cursor for the extra mirror, thus
can lead to above corruption.
[FIX]
The extra mirror is never a reliable one, at the beginning of
dev-replace, the reliability is zero, while only at the end of the
replace it's a fully reliable mirror.
We either do the complex tracking, or never trust it.
IMHO it's much easier to maintain if we don't trust it at all, and the
extra mirror can only benefit for a limited period of time (during
replace).
Thus this patch would completely remove the ability to use target device
as an extra mirror.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>