Chao Yu [Thu, 4 Oct 2018 03:15:18 +0000 (11:15 +0800)]
 
f2fs: shrink sbi->sb_lock coverage in set_file_temperature()
file_set_{cold,hot} doesn't need holding sbi->sb_lock, so moving them
out of the lock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 4 Oct 2018 03:18:30 +0000 (11:18 +0800)]
 
f2fs: use rb_*_cached friends
As rbtree supports caching leftmost node natively, update f2fs codes
to use rb_*_cached helpers to speed up leftmost node visiting.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 3 Oct 2018 14:32:44 +0000 (22:32 +0800)]
 
f2fs: fix to recover cold bit of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. chattr -A /mnt/f2fs/file
11. xfs_io -f /mnt/f2fs/file -c "fsync"
12. umount /mnt/f2fs
13. mount -t f2fs /dev/sdd /mnt/f2fs
14. lsattr /mnt/f2fs/file
-----------------N- /mnt/f2fs/file
But actually, we expect the corrct result is:
-------A---------N- /mnt/f2fs/file
The reason is in step 9) we missed to recover cold bit flag in inode
block, so later, in fsync, we will skip write inode block due to below
condition check, result in lossing data in another SPOR.
f2fs_fsync_node_pages()
	if (!IS_DNODE(page) || !is_cold_node(page))
		continue;
Note that, I guess that some non-dir inode has already lost cold bit
during POR, so in order to reenable recovery for those inode, let's
try to recover cold bit in f2fs_iget() to save more fsynced data.
Fixes: c56675750d7c ("f2fs: remove unneeded set_cold_node()")
Cc: <stable@vger.kernel.org> 4.17+
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 12 Sep 2018 23:40:53 +0000 (07:40 +0800)]
 
f2fs: submit cached bio to avoid endless PageWriteback
When migrating encrypted block from background GC thread, we only add
them into f2fs inner bio cache, but forget to submit the cached bio, it
may cause potential deadlock when we are waiting page writebacked, fix
it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daniel Rosenberg [Tue, 21 Aug 2018 02:21:43 +0000 (19:21 -0700)]
 
f2fs: checkpoint disabling
Note that, it requires "f2fs: return correct errno in f2fs_gc".
This adds a lightweight non-persistent snapshotting scheme to f2fs.
To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 25 Sep 2018 20:54:33 +0000 (13:54 -0700)]
 
f2fs: clear PageError on the read path
When running fault injection test, I hit somewhat wrong behavior in f2fs_gc ->
gc_data_segment():
0. fault injection generated some PageError'ed pages
1. gc_data_segment
 -> f2fs_get_read_data_page(REQ_RAHEAD)
2. move_data_page
 -> f2fs_get_lock_data_page()
  -> f2f_get_read_data_page()
   -> f2fs_submit_page_read()
    -> submit_bio(READ)
  -> return EIO due to PageError
  -> fail to move data
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 27 Sep 2018 10:34:52 +0000 (18:34 +0800)]
 
f2fs: allow out-place-update for direct IO in LFS mode
Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't
allow triggering any in-place-update writes, so we fallback direct
write to buffered write, result in bad performance of large size
write.
This patch adds to support triggering out-place-update for direct IO
to enhance its performance.
Note that it needs to exclude direct read IO during direct write,
since new data writing to new block address will no be valid until
write finished.
storage: zram
time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 
1073741824" -c "fsync"
Before:
real	0m13.061s
user	0m0.327s
sys	0m12.486s
After:
real	0m6.448s
user	0m0.228s
sys	0m6.212s
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 27 Sep 2018 10:33:18 +0000 (18:33 +0800)]
 
f2fs: refactor ->page_mkwrite() flow
Thread A				Thread B
- f2fs_vm_page_mkwrite
					- f2fs_setattr
					 - down_write(i_mmap_sem)
					 - truncate_setsize
					 - f2fs_truncate
					 - up_write(i_mmap_sem)
 - f2fs_reserve_block
 reserve NEW_ADDR
 - skip dirty page due to truncation
1. we don't need to rserve new block address for a truncated page.
2. dn.data_blkaddr is used out of node page lock coverage.
Refactor ->page_mkwrite() flow to fix above issues:
- use __do_map_lock() to avoid racing checkpoint()
- lock data page in prior to dnode page
- cover f2fs_reserve_block with i_mmap_sem lock
- wait page writeback before zeroing page
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 27 Sep 2018 15:41:16 +0000 (23:41 +0800)]
 
Revert: "f2fs: check last page index in cached bio to decide submission"
There is one case that we can leave bio in f2fs, result in hanging
page writeback waiter.
Thread A				Thread B
- f2fs_write_cache_pages
 - f2fs_submit_page_write
 page #0 cached in bio #0 of cold log
 - f2fs_submit_page_write
 page #1 cached in bio #1 of warm log
					- f2fs_write_cache_pages
					 - f2fs_submit_page_write
					 bio is full, submit bio #1 contain page #1
 - f2fs_submit_merged_write_cond(, page #1)
 fail to submit bio #0 due to page #1 is not in any cached bios.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Junling Zheng [Fri, 28 Sep 2018 12:25:56 +0000 (20:25 +0800)]
 
f2fs: support superblock checksum
Now we support crc32 checksum for superblock.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 29 Sep 2018 10:31:28 +0000 (18:31 +0800)]
 
f2fs: add to account skip count of background GC
This patch adds to account skip count of background GC, and show stat
info via 'status' debugfs entry.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 29 Sep 2018 10:31:27 +0000 (18:31 +0800)]
 
f2fs: add to account meta IO
This patch supports to account meta IO, it enables to show write IO
from f2fs more comprehensively via 'status' debugfs entry.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 28 Sep 2018 07:24:39 +0000 (00:24 -0700)]
 
f2fs: keep lazytime on remount
This patch fixes losing lazytime when remounting f2fs.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 28 Sep 2018 05:15:31 +0000 (22:15 -0700)]
 
f2fs: fix missing up_read
This patch fixes missing up_read call.
Fixes: c9b60788fc76 ("f2fs: fix to do sanity check with block address in main area")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 25 Sep 2018 22:25:21 +0000 (15:25 -0700)]
 
f2fs: return correct errno in f2fs_gc
This fixes overriding error number in f2fs_gc.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 18 Sep 2018 00:36:06 +0000 (17:36 -0700)]
 
f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIO
This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during
xfstests/generic/475.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 25 Sep 2018 07:36:03 +0000 (15:36 +0800)]
 
f2fs: mark inode dirty explicitly in recover_inode()
Mark inode dirty explicitly in the end of recover_inode() to make sure
that all recoverable fields can be persisted later.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 25 Sep 2018 07:36:01 +0000 (15:36 +0800)]
 
f2fs: fix to recover inode's crtime during POR
Testcase to reproduce this bug:
1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. xfs_io -f /mnt/f2fs/file -c "fsync"
5. godown /mnt/f2fs
6. umount /mnt/f2fs
7. mount -t f2fs /dev/sdd /mnt/f2fs
8. xfs_io -f /mnt/f2fs/file -c "statx -r"
stat.btime.tv_sec = 0
stat.btime.tv_nsec = 0
This patch fixes to recover inode creation time fields during
mount.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 25 Sep 2018 07:36:00 +0000 (15:36 +0800)]
 
f2fs: fix to recover inode's i_gc_failures during POR
inode.i_gc_failures is used to indicate that skip count of migrating
on blocks of inode, we should guarantee it can be recovered in sudden
power-off case.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 25 Sep 2018 07:35:59 +0000 (15:35 +0800)]
 
f2fs: fix to recover inode's i_flags during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr /mnt/f2fs/file
-----------------N- /mnt/f2fs/file
But actually, we expect the corrct result is:
-------A---------N- /mnt/f2fs/file
The reason is we didn't recover inode.i_flags field during mount,
fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 25 Sep 2018 07:35:58 +0000 (15:35 +0800)]
 
f2fs: fix to recover inode's project id during POR
Testcase to reproduce this bug:
1. mkfs.f2fs -O extra_attr -O project_quota /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr -p 1 /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr -p /mnt/f2fs/file
    0 -----------------N- /mnt/f2fs/file
But actually, we expect the correct result is:
    1 -----------------N- /mnt/f2fs/file
The reason is we didn't recover inode.i_projid field during mount,
fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 19 Sep 2018 22:28:40 +0000 (15:28 -0700)]
 
f2fs: update i_size after DIO completion
This is related to
ee70daaba82d ("xfs: update i_size after unwritten conversion in dio completion")
If we update i_size during dio_write, dio_read can read out stale data, which
breaks xfstests/465.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 17 Sep 2018 20:25:04 +0000 (13:25 -0700)]
 
f2fs: report ENOENT correctly in f2fs_rename
This fixes wrong error report in f2fs_rename.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Sat, 22 Sep 2018 14:43:09 +0000 (22:43 +0800)]
 
f2fs: fix remount problem of option io_bits
Currently we show mount option "io_bits=%u" as "io_size=%uKB",
it will cause option parsing problem(unrecognized mount option)
in remount.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 20 Sep 2018 09:41:30 +0000 (17:41 +0800)]
 
f2fs: fix to recover inode's uid/gid during POR
Step to reproduce this bug:
1. logon as root
2. mount -t f2fs /dev/sdd /mnt;
3. touch /mnt/file;
4. chown system /mnt/file; chgrp system /mnt/file;
5. xfs_io -f /mnt/file -c "fsync";
6. godown /mnt;
7. umount /mnt;
8. mount -t f2fs /dev/sdd /mnt;
After step 8) we will expect file's uid/gid are all system, but during
recovery, these two fields were not been recovered, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 19 Sep 2018 22:45:19 +0000 (15:45 -0700)]
 
f2fs: avoid infinite loop in f2fs_alloc_nid
If we have an error in f2fs_build_free_nids, we're able to fall into a loop
to find free nids.
Suggested-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sahitya Tummala [Wed, 19 Sep 2018 08:48:47 +0000 (14:18 +0530)]
 
f2fs: add new idle interval timing for discard and gc paths
This helps to control the frequency of submission of discard and
GC requests independently, based on the need.
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 12 Sep 2018 01:22:29 +0000 (09:22 +0800)]
 
f2fs: split IO error injection according to RW
This patch adds to support injecting error for write IO, this can simulate
IO error like fail_make_request or dm_flakey does.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 12 Sep 2018 01:16:07 +0000 (09:16 +0800)]
 
f2fs: add SPDX license identifiers
Remove the verbose license text from f2fs files and replace them with
SPDX tags.  This does not change the license of any of the code.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Wed, 12 Sep 2018 05:32:52 +0000 (13:32 +0800)]
 
f2fs: surround fault_injection related option parsing using CONFIG_F2FS_FAULT_INJECTION
It's a little bit strange when fault_injection related
options fail with -EINVAL which were already disabled
from config, so surround all fault_injection related option
parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
slightly change warning message to keep consistency with
option POSIX_ACL and FS_XATTR.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wang Shilong [Mon, 10 Sep 2018 23:54:21 +0000 (08:54 +0900)]
 
f2fs: fix setattr project check upon fssetxattr ioctl
Currently, project quota could be changed by fssetxattr
ioctl, and existed permission check inode_owner_or_capable()
is obviously not enough, just think that common users could
change project id of file, that could make users to
break project quota easily.
This patch try to follow same regular of xfs project
quota:
"Project Quota ID state is only allowed to change from
within the init namespace. Enforce that restriction only
if we are trying to change the quota ID state.
Everything else is allowed in user namespaces."
Besides that, check and set project id'state should
be an atomic operation, protect whole operation with
inode lock.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Zhikang Zhang [Mon, 10 Sep 2018 08:18:25 +0000 (16:18 +0800)]
 
f2fs: avoid sleeping under spin_lock
In the call trace below, we might sleep in function dput().
So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
from __try_update_largest_extent && __drop_largest_extent.
BUG: sleeping function called from invalid context at fs/dcache.c:796
Call trace:
	dump_backtrace+0x0/0x3f4
	show_stack+0x24/0x30
	dump_stack+0xe0/0x138
	___might_sleep+0x2a8/0x2c8
	__might_sleep+0x78/0x10c
	dput+0x7c/0x750
	block_dump___mark_inode_dirty+0x120/0x17c
	__mark_inode_dirty+0x344/0x11f0
	f2fs_mark_inode_dirty_sync+0x40/0x50
	__insert_extent_tree+0x2e0/0x2f4
	f2fs_update_extent_tree_range+0xcf4/0xde8
	f2fs_update_extent_cache+0x114/0x12c
	f2fs_update_data_blkaddr+0x40/0x50
	write_data_page+0x150/0x314
	do_write_data_page+0x648/0x2318
	__write_data_page+0xdb4/0x1640
	f2fs_write_cache_pages+0x768/0xafc
	__f2fs_write_data_pages+0x590/0x1218
	f2fs_write_data_pages+0x64/0x74
	do_writepages+0x74/0xe4
	__writeback_single_inode+0xdc/0x15f0
	writeback_sb_inodes+0x574/0xc98
	__writeback_inodes_wb+0x190/0x204
	wb_writeback+0x730/0xf14
	wb_check_old_data_flush+0x1bc/0x1c8
	wb_workfn+0x554/0xf74
	process_one_work+0x440/0x118c
	worker_thread+0xac/0x974
	kthread+0x1a0/0x1c8
	ret_from_fork+0x10/0x1c
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 7 Sep 2018 11:49:07 +0000 (19:49 +0800)]
 
f2fs: plug readahead IO in readdir()
Add a plug to merge readahead IO in readdir(), expecting it can
reduce bio count before submitting to block layer.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 6 Sep 2018 12:34:12 +0000 (20:34 +0800)]
 
f2fs: fix to do sanity check with current segment number
https://bugzilla.kernel.org/show_bug.cgi?id=200219
Reproduction way:
- mount image
- run poc code
- umount image
F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 17686 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: update_sit_entry+0x459/0x4e0 [f2fs]
Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
EAX: 
00000032 EBX: 
000000f8 ECX: 
00000002 EDX: 
00000001
ESI: 
d7177000 EDI: 
f520fe68 EBP: 
d6477c6c ESP: 
d6477c34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 
00010282
CR0: 
80050033 CR2: 
b7fbe000 CR3: 
2a99b3c0 CR4: 
000406f0
Call Trace:
 f2fs_allocate_data_block+0x124/0x580 [f2fs]
 do_write_page+0x78/0x150 [f2fs]
 f2fs_do_write_node_page+0x25/0xa0 [f2fs]
 __write_node_page+0x2bf/0x550 [f2fs]
 f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
 ? sync_inode_metadata+0x2f/0x40
 ? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
 ? up_write+0x1e/0x80
 f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
 ? mark_held_locks+0x5d/0x80
 ? _raw_spin_unlock_irq+0x27/0x50
 kill_f2fs_super+0x68/0x90 [f2fs]
 deactivate_locked_super+0x3d/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x39/0x70
 __cleanup_mnt+0x10/0x20
 task_work_run+0x81/0xa0
 exit_to_usermode_loop+0x59/0xa7
 do_fast_syscall_32+0x1f5/0x22c
 entry_SYSENTER_32+0x53/0x86
EIP: 0xb7f95c51
Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
EAX: 
00000000 EBX: 
0871ab90 ECX: 
bfb2cd00 EDX: 
00000000
ESI: 
00000000 EDI: 
0871ab90 EBP: 
0871ab90 ESP: 
bfb2cd7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 
00000246
Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
---[ end trace 
d423f83982cfcdc5 ]---
The reason is, different log headers using the same segment, once
one log's next block address is used by another log, it will cause
panic as above.
Main area: 24 segs, 24 secs 24 zones
  - COLD  data: 0, 0, 0
  - WARM  data: 1, 1, 1
  - HOT   data: 20, 20, 20
  - Dir   dnode: 22, 22, 22
  - File   dnode: 22, 22, 22
  - Indir nodes: 21, 21, 21
So this patch adds sanity check to detect such condition to avoid
this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 5 Sep 2018 06:54:02 +0000 (14:54 +0800)]
 
f2fs: fix memory leak of percpu counter in fill_super()
In fill_super -> init_percpu_info, we should destroy percpu counter
in error path, otherwise memory allcoated for percpu counter will
leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 5 Sep 2018 06:54:01 +0000 (14:54 +0800)]
 
f2fs: fix memory leak of write_io in fill_super()
It needs to release memory allocated for sbi->write_io in error path,
otherwise, it will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Fri, 31 Aug 2018 14:33:50 +0000 (22:33 +0800)]
 
f2fs: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 22 Aug 2018 09:11:05 +0000 (17:11 +0800)]
 
f2fs: fix to flush all dirty inodes recovered in readonly fs
generic/417 reported as blow:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
 ? _raw_spin_unlock+0x2c/0x50
 evict+0xa8/0x170
 dispose_list+0x34/0x40
 evict_inodes+0x118/0x120
 generic_shutdown_super+0x41/0x100
 ? rcu_read_lock_sched_held+0x97/0xa0
 kill_block_super+0x22/0x50
 kill_f2fs_super+0x6f/0x80 [f2fs]
 deactivate_locked_super+0x3d/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x39/0x70
 __cleanup_mnt+0x10/0x20
 task_work_run+0x81/0xa0
 exit_to_usermode_loop+0x59/0xa7
 do_fast_syscall_32+0x1f5/0x22c
 entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
It can simply reproduced with scripts:
Enable quota feature during mkfs.
Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs
Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
	a) open /mnt/f2fs/file;
	b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs
The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.
Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.
To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Tue, 26 Jun 2018 05:12:43 +0000 (13:12 +0800)]
 
f2fs: report error if quota off error during umount
Now, we depend on fsck to ensure quota file data is ok,
so we scan whole partition if checkpoint without umount
flag. It's same for quota off error case, which may make
quota file data inconsistent.
generic/019 reports below error:
 __quota_error: 1160 callbacks suppressed
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have a nice day...
If we failed in below path due to fail to write dquot block, we will miss
to release quota inode, fix it.
- f2fs_put_super
 - f2fs_quota_off_umount
  - f2fs_quota_off
   - f2fs_quota_sync   <-- failed
   - dquot_quota_off   <-- missed to call
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 6 Sep 2018 18:40:12 +0000 (11:40 -0700)]
 
f2fs: submit bio after shutdown
Sometimes, some merged IOs could get a chance to be submitted, resulting in
system hang in shutdown test. This issues IOs all the time after shutdown.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Aug 2018 04:18:00 +0000 (21:18 -0700)]
 
f2fs: avoid wrong decrypted data from disk
1. Create a file in an encrypted directory
2. Do GC & drop caches
3. Read stale data before its bio for metapage was not issued yet
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 22 Aug 2018 09:17:47 +0000 (17:17 +0800)]
 
Revert "f2fs: use printk_ratelimited for f2fs_msg"
Don't limit printing log, so that we will not miss any key messages.
This reverts commit 
a36c106dffb616250117efb1cab271c19a8f94ff.
In addition, we use printk_ratelimited to avoid too many log prints.
- error injection
- discard submission failure
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sahitya Tummala [Fri, 31 Aug 2018 09:39:26 +0000 (15:09 +0530)]
 
f2fs: fix unnecessary periodic wakeup of discard thread when dev is busy
When dev is busy, discard thread wake up timeout can be aligned with the
exact time that it needs to wait for dev to come out of busy. This helps
to avoid unnecessary periodic wakeups and thus save some power.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 3 Sep 2018 19:52:17 +0000 (03:52 +0800)]
 
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
 sda1: FAT  /boot
 sda2: F2FS /
 sda3: F2FS /home
 sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
 mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
 /etc/udev/rules.d/10-sata-bridge-trim.rules:
 ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
 Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
 same disc
 same bridge
 different usb host controller
 different cpu architecture
 not root filesystem
Regards,
  Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
 Unable to handle kernel NULL pointer dereference at virtual address 
000000000000003e
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 
00000000626e3122
 [
000000000000003e] pgd=
0000000000000000
 Internal error: Oops: 
96000004 [#1] SMP
 Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
 CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
 Hardware name: Sapphire-RK3399 Board (DT)
 pstate: 
00000005 (nzcv daif -PAN -UAO)
 pc : update_sit_entry+0x304/0x4b0
 lr : update_sit_entry+0x108/0x4b0
 sp : 
ffff00000ca13bd0
 x29: 
ffff00000ca13bd0 x28: 
000000000000003e
 x27: 
0000000000000020 x26: 
0000000000080000
 x25: 
0000000000000048 x24: 
ffff8000ebb85cf8
 x23: 
0000000000000253 x22: 
00000000ffffffff
 x21: 
00000000000535f2 x20: 
00000000ffffffdf
 x19: 
ffff8000eb9e6800 x18: 
ffff8000eb9e6be8
 x17: 
0000000007ce6926 x16: 
000000001c83ffa8
 x15: 
0000000000000000 x14: 
ffff8000f602df90
 x13: 
0000000000000006 x12: 
0000000000000040
 x11: 
0000000000000228 x10: 
0000000000000000
 x9 : 
0000000000000000 x8 : 
0000000000000000
 x7 : 
00000000000535f2 x6 : 
ffff8000ebff3440
 x5 : 
ffff8000ebff3440 x4 : 
ffff8000ebe3a6c8
 x3 : 
00000000ffffffff x2 : 
0000000000000020
 x1 : 
0000000000000000 x0 : 
ffff8000eb9e5800
 Process nvim (pid: 957, stack limit = 0x0000000063a78320)
 Call trace:
  update_sit_entry+0x304/0x4b0
  f2fs_invalidate_blocks+0x98/0x140
  truncate_node+0x90/0x400
  f2fs_remove_inode_page+0xe8/0x340
  f2fs_evict_inode+0x2b0/0x408
  evict+0xe0/0x1e0
  iput+0x160/0x260
  do_unlinkat+0x214/0x298
  __arm64_sys_unlinkat+0x3c/0x68
  el0_svc_handler+0x94/0x118
  el0_svc+0x8/0xc
 Code: 
f9400800 b9488400 36080140 f9400f01 (
387c4820)
 ---[ end trace 
a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Thu, 30 Aug 2018 13:33:31 +0000 (21:33 +0800)]
 
f2fs: add additional sanity check in f2fs_acl_from_disk()
Add additinal sanity check for irregular case(e.g. corruption).
If size of extended attribution is smaller than size of acl header,
then return -EINVAL.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Wed, 5 Sep 2018 16:27:45 +0000 (09:27 -0700)]
 
Merge tag 'gpio-v4.19-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
 "Some GPIO fixes. The ACPI stuff is probably the most annoying for
  users that get fixed this time.
   - Atomic contexts, cansleep* calls and such fastpath/slopwpath
     things.
   - Defer ACPI event handler registration to late_initcall() so IRQs do
     not fire in our face before other drivers have a chance to register
     handlers.
   - Race condition if a consumer requests a GPIO after
     gpiochip_add_data_with_key() but before of_gpiochip_add()
   - Probe errorpath in the dwapb driver"
* tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: Fix crash due to registration race
  gpio: dwapb: Fix error handling in dwapb_gpio_probe()
  gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall
  gpiolib: acpi: Switch to cansleep version of GPIO library call
  gpio: adp5588: Fix sleep-in-atomic-context bug
Linus Torvalds [Wed, 5 Sep 2018 16:17:20 +0000 (09:17 -0700)]
 
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "A set of very minor fixes and a couple of reverts to fix a major
  problem (the attempt to change the busy count causes a hang when
  attempting to change the drive cache type)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: aacraid: fix a signedness bug
  Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq"
  Revert "scsi: core: fix scsi_host_queue_ready"
  scsi: libata: Add missing newline at end of file
  scsi: target: iscsi: cxgbit: use pr_debug() instead of pr_info()
  scsi: hpsa: limit transfer length to 1MB, not 512kB
  scsi: lpfc: Correct MDS diag and nvmet configuration
  scsi: lpfc: Default fdmi_on to on
  scsi: csiostor: fix incorrect port capabilities
  scsi: csiostor: add a check for NULL pointer after kmalloc()
  scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters
  scsi: core: Update SCSI_MQ_DEFAULT help text to match default
Linus Torvalds [Wed, 5 Sep 2018 16:13:31 +0000 (09:13 -0700)]
 
Merge tag 'nds32-for-linus-4.19-tag1' of git://git./linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
 "Contained in here are the bug fixes, building error fixes and ftrace
  support for nds32"
* tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
  nds32: linker script: GCOV kernel may refers data in __exit
  nds32: fix build error because of wrong semicolon
  nds32: Fix a kernel panic issue because of wrong frame pointer access.
  nds32: Only print one page of stack when die to prevent printing too much information.
  nds32: Add macro definition for offset of lp register on stack
  nds32: Remove the deprecated ABI implementation
  nds32/stack: Get real return address by using ftrace_graph_ret_addr
  nds32/ftrace: Support dynamic function graph tracer
  nds32/ftrace: Support dynamic function tracer
  nds32/ftrace: Add RECORD_MCOUNT support
  nds32/ftrace: Support static function graph tracer
  nds32/ftrace: Support static function tracer
  nds32: Extract the checking and getting pointer to a macro
  nds32: Clean up the coding style
  nds32: Fix get_user/put_user macro expand pointer problem
  nds32: Fix empty call trace
  nds32: add NULL entry to the end of_device_id array
  nds32: fix logic for module
Greentime Hu [Tue, 4 Sep 2018 06:25:57 +0000 (14:25 +0800)]
 
nds32: linker script: GCOV kernel may refers data in __exit
This patch is used to fix nds32 allmodconfig/allyesconfig build error
because GCOV kernel embeds counters in the kernel for each line
and a part of that embed in __exit text. So we need to keep the
EXIT_TEXT and EXIT_DATA  if CONFIG_GCOV_KERNEL=y.
Link: https://lkml.org/lkml/2018/9/1/125
Signed-off-by: Greentime Hu <greentime@andestech.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Linus Torvalds [Wed, 5 Sep 2018 00:01:11 +0000 (17:01 -0700)]
 
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  nilfs2: convert to SPDX license tags
  drivers/dax/device.c: convert variable to vm_fault_t type
  lib/Kconfig.debug: fix three typos in help text
  checkpatch: add __ro_after_init to known $Attribute
  mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
  uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
  memory_hotplug: fix kernel_panic on offline page processing
  checkpatch: add optional static const to blank line declarations test
  ipc/shm: properly return EIDRM in shm_lock()
  mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
  mm/util.c: improve kvfree() kerneldoc
  tools/vm/page-types.c: fix "defined but not used" warning
  tools/vm/slabinfo.c: fix sign-compare warning
  kmemleak: always register debugfs file
  mm: respect arch_dup_mmap() return value
  mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
  mm: memcontrol: print proper OOM header when no eligible victim left
Ryusuke Konishi [Tue, 4 Sep 2018 22:46:30 +0000 (15:46 -0700)]
 
nilfs2: convert to SPDX license tags
Remove the verbose license text from NILFS2 files and replace them with
SPDX tags.  This does not change the license of any of the code.
Link: http://lkml.kernel.org/r/1535624528-5982-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Souptick Joarder [Tue, 4 Sep 2018 22:46:26 +0000 (15:46 -0700)]
 
drivers/dax/device.c: convert variable to vm_fault_t type
As part of 
226ab561075f ("device-dax: Convert to vmf_insert_mixed and
vm_fault_t") in 4.19-rc1, 'rc' was not converted to vm_fault_t.  Now
converted.
Link: http://lkml.kernel.org/r/20180830153813.GA26059@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thibaut Sautereau [Tue, 4 Sep 2018 22:46:23 +0000 (15:46 -0700)]
 
lib/Kconfig.debug: fix three typos in help text
Fix three typos in CONFIG_WARN_ALL_UNSEEDED_RANDOM help text.
Link: http://lkml.kernel.org/r/20180830194505.4778-1-thibaut@sautereau.fr
Signed-off-by: Thibaut Sautereau <thibaut@sautereau.fr>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 4 Sep 2018 22:46:20 +0000 (15:46 -0700)]
 
checkpatch: add __ro_after_init to known $Attribute
__ro_after_init is a specific __attribute__ that checkpatch does currently
not understand.
Add it to the known $Attribute types so that code that uses variables
declared with __ro_after_init are not thought to be a modifier type.
This appears as a defect in checkpatch output of code like:
static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
[...]
       if (trust_cpu && arch_init) {
where checkpatch reports:
ERROR: space prohibited after that '&&' (ctx:WxW)
	if (trust_cpu && arch_init) {
Link: http://lkml.kernel.org/r/0fa8a2cb83ade4c525e18261ecf6cfede3015983.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jiang [Tue, 4 Sep 2018 22:46:16 +0000 (15:46 -0700)]
 
mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
It looks like I missed the PUD path when doing VM_MIXEDMAP removal.
This can be triggered by:
1. Boot with memmap=4G!8G
2. build ndctl with destructive flag on
3. make TESTS=device-dax check
[  +0.000675] kernel BUG at mm/huge_memory.c:824!
Applying the same change that was applied to vmf_insert_pfn_pmd() in the
original patch.
Link: http://lkml.kernel.org/r/153565957352.35524.1005746906902065126.stgit@djiang5-desk3.ch.intel.com
Fixes: e1fb4a08649 ("dax: remove VM_MIXEDMAP for fsdax and device dax")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 4 Sep 2018 22:46:13 +0000 (15:46 -0700)]
 
uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
Since this header is in "include/uapi/linux/", apparently people want to
use it in userspace programs -- even in C++ ones.  However, the header
uses a C++ reserved keyword ("private"), so change that to "dh_private"
instead to allow the header file to be used in C++ userspace.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=191051
Link: http://lkml.kernel.org/r/0db6c314-1ef4-9bfa-1baa-7214dd2ee061@infradead.org
Fixes: ddbb41148724 ("KEYS: Add KEYCTL_DH_COMPUTE command")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikhail Zaslonko [Tue, 4 Sep 2018 22:46:09 +0000 (15:46 -0700)]
 
memory_hotplug: fix kernel_panic on offline page processing
Within show_valid_zones() the function test_pages_in_a_zone() should be
called for online memory blocks only.
Otherwise it might lead to the VM_BUG_ON due to uninitialized struct
pages (when CONFIG_DEBUG_VM_PGFLAGS kernel option is set):
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 ------------[ cut here ]------------
 Call Trace:
 ([<
000000000038f91e>] test_pages_in_a_zone+0xe6/0x168)
  [<
0000000000923472>] show_valid_zones+0x5a/0x1a8
  [<
0000000000900284>] dev_attr_show+0x3c/0x78
  [<
000000000046f6f0>] sysfs_kf_seq_show+0xd0/0x150
  [<
00000000003ef662>] seq_read+0x212/0x4b8
  [<
00000000003bf202>] __vfs_read+0x3a/0x178
  [<
00000000003bf3ca>] vfs_read+0x8a/0x148
  [<
00000000003bfa3a>] ksys_read+0x62/0xb8
  [<
0000000000bc2220>] system_call+0xdc/0x2d8
That VM_BUG_ON was triggered by the page poisoning introduced in
mm/sparse.c with the git commit 
d0dc12e86b31 ("mm/memory_hotplug:
optimize memory hotplug").
With the same commit the new 'nid' field has been added to the struct
memory_block in order to store and later on derive the node id for
offline pages (instead of accessing struct page which might be
uninitialized).  But one reference to nid in show_valid_zones() function
has been overlooked.  Fixed with current commit.  Also, nr_pages will
not be used any more after test_pages_in_a_zone() call, do not update
it.
Link: http://lkml.kernel.org/r/20180828090539.41491-1-zaslonko@linux.ibm.com
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: <stable@vger.kernel.org>	[4.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 4 Sep 2018 22:46:06 +0000 (15:46 -0700)]
 
checkpatch: add optional static const to blank line declarations test
Using a static const struct definition as part of a series of
declarations produces a false positive "Missing a blank line after
declarations" for code like:
  WARNING: Missing a blank line after declarations
  #710: FILE: drivers/gpu/drm/tidss/tidss_scale_coefs.c:137:
  +       int inc;
  +       static const struct {
So fix it.
Link: http://lkml.kernel.org/r/5905126e70b0ed1781e49265fd5c49c5090d0223.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Jyri Sarha <jsarha@ti.com>
Cc: "Valkeinen, Tomi" <tomi.valkeinen@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 4 Sep 2018 22:46:02 +0000 (15:46 -0700)]
 
ipc/shm: properly return EIDRM in shm_lock()
When getting rid of the general ipc_lock(), this was missed furthermore,
making the comment around the ipc object validity check bogus.  Under
EIDRM conditions, callers will in turn not see the error and continue
with the operation.
Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5
Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian
Fixes: 82061c57ce9 ("ipc: drop ipc_lock()")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 4 Sep 2018 22:45:59 +0000 (15:45 -0700)]
 
mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
When scanning for movable pages, filter out Hugetlb pages if hugepage
migration is not supported.  Without this we hit infinte loop in
__offline_pages() where we do
	pfn = scan_movable_pages(start_pfn, end_pfn);
	if (pfn) { /* We have movable pages */
		ret = do_migrate_range(pfn, end_pfn);
		goto repeat;
	}
Fix this by checking hugepage_migration_supported both in
has_unmovable_pages which is the primary backoff mechanism for page
offlining and for consistency reasons also into scan_movable_pages
because it doesn't make any sense to return a pfn to non-migrateable
huge page.
This issue was revealed by, but not caused by 
72b39cfc4d75 ("mm,
memory_hotplug: do not fail offlining too early").
Link: http://lkml.kernel.org/r/20180824063314.21981-1-aneesh.kumar@linux.ibm.com
Fixes: 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Haren Myneni <haren@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 4 Sep 2018 22:45:55 +0000 (15:45 -0700)]
 
mm/util.c: improve kvfree() kerneldoc
Scooped from an email from Matthew.
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Tue, 4 Sep 2018 22:45:51 +0000 (15:45 -0700)]
 
tools/vm/page-types.c: fix "defined but not used" warning
debugfs_known_mountpoints[] is not used any more, so let's remove it.
Link: http://lkml.kernel.org/r/1535102651-19418-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Tue, 4 Sep 2018 22:45:48 +0000 (15:45 -0700)]
 
tools/vm/slabinfo.c: fix sign-compare warning
Currently we get the following compiler warning:
    slabinfo.c:854:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (s->object_size < min_objsize)
                          ^
due to the mismatch of signed/unsigned comparison.  ->object_size and
->slab_size are never expected to be negative, so let's define them as
unsigned int.
[n-horiguchi@ah.jp.nec.com: convert everything - none of these can be negative]
Link: http://lkml.kernel.org/r/20180826234947.GA9787@hori1.linux.bs1.fc.nec.co.jp
Link: http://lkml.kernel.org/r/1535103134-20239-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vincent Whitchurch [Tue, 4 Sep 2018 22:45:44 +0000 (15:45 -0700)]
 
kmemleak: always register debugfs file
If kmemleak built in to the kernel, but is disabled by default, the
debugfs file is never registered.  Because of this, it is not possible
to find out if the kernel is built with kmemleak support by checking for
the presence of this file.  To allow this, always register the file.
After this patch, if the file doesn't exist, kmemleak is not available
in the kernel.  If writing "scan" or any other value than "clear" to
this file results in EBUSY, then kmemleak is available but is disabled
by default and can be activated via the kernel command line.
Catalin: "that's also consistent with a late disabling of kmemleak when
the debugfs entry sticks around."
Link: http://lkml.kernel.org/r/20180824131220.19176-1-vincent.whitchurch@axis.com
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadav Amit [Tue, 4 Sep 2018 22:45:41 +0000 (15:45 -0700)]
 
mm: respect arch_dup_mmap() return value
Commit 
d70f2a14b72a ("include/linux/sched/mm.h: uninline mmdrop_async(),
etc") ignored the return value of arch_dup_mmap(). As a result, on x86,
a failure to duplicate the LDT (e.g. due to memory allocation error)
would leave the duplicated memory mapping in an inconsistent state.
Fix by using the return value, as it was before the change.
Link: http://lkml.kernel.org/r/20180823051229.211856-1-namit@vmware.com
Fixes: d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Tue, 4 Sep 2018 22:45:37 +0000 (15:45 -0700)]
 
mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
Commit 
93065ac753e4 ("mm, oom: distinguish blockable mode for mmu
notifiers") has added an ability to skip over vmas with blockable mmu
notifiers. This however didn't call tlb_finish_mmu as it should.
As a result inc_tlb_flush_pending has been called without its pairing
dec_tlb_flush_pending and all callers mm_tlb_flush_pending would flush
even though this is not really needed.  This alone is not harmful and it
seems there shouldn't be any such callers for oom victims at all but
there is no real reason to skip tlb_finish_mmu on early skip either so
call it.
[mhocko@suse.com: new changelog]
Link: http://lkml.kernel.org/r/b752d1d5-81ad-7a35-2394-7870641be51c@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 4 Sep 2018 22:45:34 +0000 (15:45 -0700)]
 
mm: memcontrol: print proper OOM header when no eligible victim left
When the memcg OOM killer runs out of killable tasks, it currently
prints a WARN with no further OOM context.  This has caused some user
confusion.
Warnings indicate a kernel problem.  In a reported case, however, the
situation was triggered by a nonsensical memcg configuration (hard limit
set to 0).  But without any VM context this wasn't obvious from the
report, and it took some back and forth on the mailing list to identify
what is actually a trivial issue.
Handle this OOM condition like we handle it in the global OOM killer:
dump the full OOM context and tell the user we ran out of tasks.
This way the user can identify misconfigurations easily by themselves
and rectify the problem - without having to go through the hassle of
running into an obscure but unsettling warning, finding the appropriate
kernel mailing list and waiting for a kernel developer to remote-analyze
that the memcg configuration caused this.
If users cannot make sense of why the OOM killer was triggered or why it
failed, they will still report it to the mailing list, we know that from
experience.  So in case there is an actual kernel bug causing this,
kernel developers will very likely hear about it.
Link: http://lkml.kernel.org/r/20180821160406.22578-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 4 Sep 2018 19:45:11 +0000 (12:45 -0700)]
 
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 1) Must perform TXQ teardown before unregistering interfaces in
    mac80211, from Toke Høiland-Jørgensen.
 2) Don't allow creating mac80211_hwsim with less than one channel, from
    Johannes Berg.
 3) Division by zero in cfg80211, fix from Johannes Berg.
 4) Fix endian issue in tipc, from Haiqing Bai.
 5) BPF sockmap use-after-free fixes from Daniel Borkmann.
 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
    kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
 9) Fix deadlock in hv_netvsc, from Dexuan Cui.
10) Do not restart timewait timer on RST, from Florian Westphal.
11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
15) Use-after-free in act_ife, fix from Cong Wang.
16) Missing hold to meta module in act_ife, from Vlad Buslov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  net: phy: sfp: Handle unimplemented hwmon limits and alarms
  net: sched: action_ife: take reference to meta module
  act_ife: fix a potential use-after-free
  net/mlx5: Fix SQ offset in QPs with small RQ
  tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
  tipc: correct spelling errors for struct tipc_bc_base's comment
  bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
  bnxt_en: Clean up unused functions.
  bnxt_en: Fix firmware signaled resource change logic in open.
  sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
  sctp: fix invalid reference to the index variable of the iterator
  net/ibm/emac: wrong emac_calc_base call was used by typo
  net: sched: null actions array pointer before releasing action
  vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
  r8169: add support for NCube 8168 network card
  ip6_tunnel: respect ttl inherit for ip6tnl
  mac80211: shorten the IBSS debug messages
  mac80211: don't Tx a deauth frame if the AP forbade Tx
  mac80211: Fix station bandwidth setting after channel switch
  mac80211: fix a race between restart and CSA flows
  ...
Andrew Lunn [Tue, 4 Sep 2018 02:23:56 +0000 (04:23 +0200)]
 
net: phy: sfp: Handle unimplemented hwmon limits and alarms
Not all SFPs implement the registers containing sensor limits and
alarms. Luckily, there is a bit indicating if they are implemented or
not. Add checking for this bit, when deciding if the hwmon attributes
should be visible.
Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 3 Sep 2018 21:44:42 +0000 (00:44 +0300)]
 
net: sched: action_ife: take reference to meta module
Recent refactoring of add_metainfo() caused use_all_metadata() to add
metainfo to ife action metalist without taking reference to module. This
causes warning in module_put called from ife action cleanup function.
Implement add_metainfo_and_get_ops() function that returns with reference
to module taken if metainfo was added successfully, and call it from
use_all_metadata(), instead of calling __add_metainfo() directly.
Example warning:
[  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
[  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
[  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
[  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  646.440595] RIP: 0010:module_put+0x1cb/0x230
[  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
[  646.464997] RSP: 0018:
ffff880354d37068 EFLAGS: 
00010286
[  646.470599] RAX: 
0000000000000000 RBX: 
ffffffffc0a52518 RCX: 
ffffffff8c2668db
[  646.478118] RDX: 
0000000000000003 RSI: 
dffffc0000000000 RDI: 
ffffffffc0a52518
[  646.485641] RBP: 
ffffffffc0a52180 R08: 
fffffbfff814a4a4 R09: 
fffffbfff814a4a3
[  646.493164] R10: 
ffffffffc0a5251b R11: 
fffffbfff814a4a4 R12: 
1ffff1006a9a6e0d
[  646.500687] R13: 
00000000ffffffff R14: 
ffff880362bab890 R15: 
dead000000000100
[  646.508213] FS:  
00007f4164c99800(0000) GS:
ffff88036fe40000(0000) knlGS:
0000000000000000
[  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[  646.523080] CR2: 
00007f41638b8420 CR3: 
0000000351df0004 CR4: 
00000000001606e0
[  646.530595] Call Trace:
[  646.533408]  ? find_symbol_in_section+0x260/0x260
[  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
[  646.543695]  tcf_action_cleanup+0x29/0xa0
[  646.548078]  __tcf_action_put+0x5a/0xb0
[  646.552289]  ? nla_put+0x65/0xe0
[  646.555889]  __tcf_idr_release+0x48/0x60
[  646.560187]  tcf_generic_walker+0x448/0x6b0
[  646.564764]  ? tcf_action_dump_1+0x450/0x450
[  646.569411]  ? __lock_is_held+0x84/0x110
[  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
[  646.578982]  tca_action_gd+0x972/0xc40
[  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
[  646.588471]  ? mark_lock+0xcf/0x980
[  646.592324]  ? check_chain_key+0x140/0x1f0
[  646.596832]  ? debug_show_all_locks+0x240/0x240
[  646.601839]  ? memset+0x1f/0x40
[  646.605350]  ? nla_parse+0xca/0x1a0
[  646.609217]  tc_ctl_action+0x215/0x230
[  646.613339]  ? tcf_action_add+0x220/0x220
[  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
[  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.626466]  netlink_rcv_skb+0x18d/0x200
[  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.634959]  ? netlink_ack+0x500/0x500
[  646.639106]  netlink_unicast+0x2d0/0x370
[  646.643409]  ? netlink_attachskb+0x340/0x340
[  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
[  646.652870]  ? import_iovec+0x11e/0x1c0
[  646.657083]  netlink_sendmsg+0x3b9/0x6a0
[  646.661388]  ? netlink_unicast+0x370/0x370
[  646.665877]  ? netlink_unicast+0x370/0x370
[  646.670351]  sock_sendmsg+0x6b/0x80
[  646.674212]  ___sys_sendmsg+0x4a1/0x520
[  646.678443]  ? copy_msghdr_from_user+0x210/0x210
[  646.683463]  ? lock_downgrade+0x320/0x320
[  646.687849]  ? debug_show_all_locks+0x240/0x240
[  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
[  646.697418]  ? _raw_spin_unlock+0x24/0x30
[  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
[  646.706619]  ? __pmd_alloc+0x320/0x320
[  646.710738]  ? debug_show_all_locks+0x240/0x240
[  646.715649]  ? restore_nameidata+0x7b/0xa0
[  646.720117]  ? check_chain_key+0x140/0x1f0
[  646.724590]  ? check_chain_key+0x140/0x1f0
[  646.729070]  ? __fget_light+0xbc/0xd0
[  646.733121]  ? __sys_sendmsg+0xd7/0x150
[  646.737329]  __sys_sendmsg+0xd7/0x150
[  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
[  646.746003]  ? up_read+0x53/0x90
[  646.749601]  ? __do_page_fault+0x484/0x780
[  646.754105]  ? do_syscall_64+0x1e/0x2c0
[  646.758320]  do_syscall_64+0x72/0x2c0
[  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  646.767776] RIP: 0033:0x7f4163872150
[  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[  646.791474] RSP: 002b:
00007ffdef7d6b58 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002e
[  646.799721] RAX: 
ffffffffffffffda RBX: 
0000000000000024 RCX: 
00007f4163872150
[  646.807240] RDX: 
0000000000000000 RSI: 
00007ffdef7d6bd0 RDI: 
0000000000000003
[  646.814760] RBP: 
000000005b8b9482 R08: 
0000000000000001 R09: 
0000000000000000
[  646.822286] R10: 
00000000000005e7 R11: 
0000000000000246 R12: 
00007ffdef7dad20
[  646.829807] R13: 
0000000000000000 R14: 
0000000000000000 R15: 
0000000000679bc0
[  646.837360] irq event stamp: 6083
[  646.841043] hardirqs last  enabled at (6081): [<
ffffffff8c220a7d>] __call_rcu+0x17d/0x500
[  646.849882] hardirqs last disabled at (6083): [<
ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
[  646.859775] softirqs last  enabled at (5968): [<
ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
[  646.868784] softirqs last disabled at (6082): [<
ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
[  646.878845] ---[ end trace 
b1b8c12ffe51e657 ]---
Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 3 Sep 2018 18:08:15 +0000 (11:08 -0700)]
 
act_ife: fix a potential use-after-free
Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().
Fix this by moving module_put() after ops->release().
Fixes: ef6980b6becb ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Mon, 3 Sep 2018 15:06:24 +0000 (18:06 +0300)]
 
net/mlx5: Fix SQ offset in QPs with small RQ
Correct the formula for calculating the RQ page remainder,
which should be in byte granularity.  The result will be
non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
is a power of 2.
Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
SQ offset in strides granularity.
Fixes: d7037ad73daa ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greentime Hu [Tue, 28 Aug 2018 08:07:39 +0000 (16:07 +0800)]
 
nds32: fix build error because of wrong semicolon
It shall be removed in the define usage. We shall not put a semicolon there.
/kisskb/src/arch/nds32/include/asm/elf.h:126:29: error: expected '}' before ';' token
 #define ELF_DATA ELFDATA2LSB;
                             ^
/kisskb/src/fs/proc/kcore.c:318:17: note: in expansion of macro 'ELF_DATA'
     [EI_DATA] = ELF_DATA,
                 ^~~~~~~~
/kisskb/src/fs/proc/kcore.c:312:15: note: to match this '{'
    .e_ident = {
               ^
/kisskb/src/scripts/Makefile.build:307: recipe for target 'fs/proc/kcore.o' failed
Signed-off-by: Greentime Hu <greentime@andestech.com>
Greentime Hu [Thu, 23 Aug 2018 07:05:46 +0000 (15:05 +0800)]
 
Greentime Hu [Thu, 23 Aug 2018 06:47:43 +0000 (14:47 +0800)]
 
nds32: Only print one page of stack when die to prevent printing too much information.
It may print too much information sometimes if the stack is wrong or
too big. This patch can limit the debug information in a page of stack.
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Mon, 20 Aug 2018 01:51:29 +0000 (09:51 +0800)]
 
nds32: Add macro definition for offset of lp register on stack
Use macro to replace the magic number.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Mon, 20 Aug 2018 01:40:08 +0000 (09:40 +0800)]
 
nds32: Remove the deprecated ABI implementation
We are not using NDS32 ABI 2 for now, just remove the preprocessor
directives __NDS32_ABI_2.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Wed, 15 Aug 2018 03:05:40 +0000 (11:05 +0800)]
 
nds32/stack: Get real return address by using ftrace_graph_ret_addr
Function graph tracer has modified the return address to
'return_to_handler' on stack, and provide the 'ftrace_graph_ret_addr' to
get the real return address.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Wed, 15 Aug 2018 03:01:10 +0000 (11:01 +0800)]
 
nds32/ftrace: Support dynamic function graph tracer
This patch contains the implementation of dynamic function graph tracer.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Wed, 15 Aug 2018 03:00:08 +0000 (11:00 +0800)]
 
nds32/ftrace: Support dynamic function tracer
This patch contains the implementation of dynamic function tracer.
The mcount call is composed of three instructions, so there are three
nop for enough placeholder.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Wed, 15 Aug 2018 02:57:16 +0000 (10:57 +0800)]
 
nds32/ftrace: Add RECORD_MCOUNT support
Recognize NDS32 object files in recordmcount.pl.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Wed, 15 Aug 2018 02:53:04 +0000 (10:53 +0800)]
 
nds32/ftrace: Support static function graph tracer
This patch contains implementation of static function graph tracer.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Wed, 15 Aug 2018 02:45:59 +0000 (10:45 +0800)]
 
nds32/ftrace: Support static function tracer
This patch support the static function tracer. On nds32 ABI, we need to
always push return address to stack for __builtin_return_address can
work correctly, otherwise, it will get the wrong value of $lp at leaf
function.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Mon, 13 Aug 2018 08:02:53 +0000 (16:02 +0800)]
 
nds32: Extract the checking and getting pointer to a macro
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Mon, 13 Aug 2018 07:08:52 +0000 (15:08 +0800)]
 
nds32: Clean up the coding style
1. Adjust indentation.
2. Unify argument name of each macro.
3. Add space after comma in parameters list.
4. Add space after 'if' keyword.
5. Replace space by tab.
6. Change asm volatile to __asm__ __volatile__
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Mon, 13 Aug 2018 06:48:49 +0000 (14:48 +0800)]
 
nds32: Fix get_user/put_user macro expand pointer problem
The pointer argument of macro need to be taken out once first, and then
use the new pointer in the macro body.
In kernel/trace/trace.c, get_user(ch, ubuf++) causes the unexpected
increment after expand the macro.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Zong Li [Mon, 13 Aug 2018 05:28:23 +0000 (13:28 +0800)]
 
nds32: Fix empty call trace
The compiler predefined macro 'NDS32_ABI_2' had been removed, it should
use the '__NDS32_ABI_2' here.
Signed-off-by: Zong Li <zong@andestech.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
YueHaibing [Tue, 7 Aug 2018 04:03:13 +0000 (12:03 +0800)]
 
nds32: add NULL entry to the end of_device_id array
Make sure of_device_id tables are NULL terminated.
Found by coccinelle spatch "misc/of_table.cocci"
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Greentime Hu [Wed, 18 Jul 2018 01:54:55 +0000 (09:54 +0800)]
 
nds32: fix logic for module
This bug is report by Dan Carpenter. We shall use ~loc_mask instead of
!loc_mask because we need to and(&) the bits of ~loc_mask.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: c9a4a8da6baa ("nds32: Loadable modules")
Signed-off-by: Greentime Hu <greentime@andestech.com>
David S. Miller [Tue, 4 Sep 2018 05:12:02 +0000 (22:12 -0700)]
 
Merge tag 'mac80211-for-davem-2018-09-03' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Here are quite a large number of fixes, notably:
 * various A-MSDU building fixes (currently only affects mt76)
 * syzkaller & spectre fixes in hwsim
 * TXQ vs. teardown fix that was causing crashes
 * embed WMM info in reg rule, bad code here had been causing crashes
 * one compilation issue with fix from Arnd (rfkill-gpio includes)
 * fixes for a race and bad data during/after channel switch
 * nl80211: a validation fix, attribute type & unit fixes
along with other small fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhenbo Gao [Mon, 3 Sep 2018 08:36:46 +0000 (16:36 +0800)]
 
tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
tipc_conn_queue_evt -> tipc_topsrv_queue_evt
tipc_send_work -> tipc_conn_send_work
tipc_send_to_sock -> tipc_conn_send_to_sock
Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhenbo Gao [Mon, 3 Sep 2018 08:36:45 +0000 (16:36 +0800)]
 
tipc: correct spelling errors for struct tipc_bc_base's comment
Trivial fix for two spelling mistakes.
Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Sep 2018 04:59:43 +0000 (21:59 -0700)]
 
Merge branch 'bnxt_en-Bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes.
This short series fixes resource related logic in the driver, mostly
affecting the RDMA driver under corner cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 3 Sep 2018 08:23:19 +0000 (04:23 -0400)]
 
bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
Currently, the driver adjusts the bp->hw_resc.max_cp_rings by the number
of MSIX vectors used by RDMA.  There is one code path in open that needs
to check the true max_cp_rings including any used by RDMA.  This code
is now checking for the reduced max_cp_rings which will fail when the
number of cp rings is very small.
To fix this in a clean way, we don't adjust max_cp_rings anymore.
Instead, we add a helper bnxt_get_max_func_cp_rings_for_en() to get the
reduced max_cp_rings when appropriate.
Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 3 Sep 2018 08:23:18 +0000 (04:23 -0400)]
 
bnxt_en: Clean up unused functions.
Remove unused bnxt_subtract_ulp_resources().  Change
bnxt_get_max_func_irqs() to static since it is only locally used.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 3 Sep 2018 08:23:17 +0000 (04:23 -0400)]
 
bnxt_en: Fix firmware signaled resource change logic in open.
When the driver detects that resources have changed during open, it
should reset the rx and tx rings to 0.  This will properly setup the
init sequence to initialize the default rings again.  We also need
to signal the RDMA driver to stop and clear its interrupts.  We then
call the RoCE driver to restart if a new set of default rings is
successfully reserved.
Fixes: 25e1acd6b92b ("bnxt_en: Notify firmware about IF state changes.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Sep 2018 04:57:55 +0000 (21:57 -0700)]
 
Merge branch 'sctp-two-fixes-for-spp_ipv6_flowlabel-and-spp_dscp-sockopts'
Xin Long says:
====================
sctp: two fixes for spp_ipv6_flowlabel and spp_dscp sockopts
This patchset fixes two problems in sctp_apply_peer_addr_params()
when setting spp_ipv6_flowlabel or spp_dscp.
====================
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 3 Sep 2018 07:47:11 +0000 (15:47 +0800)]
 
sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
When users set params.spp_address and get a trans, ipv6_flowlabel flag
should be applied into this trans. But even if this one is not an ipv6
trans, it should not go to apply it into all other transes of the asoc
but simply ignore it.
Fixes: 0b0dce7a36fb ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 3 Sep 2018 07:47:10 +0000 (15:47 +0800)]
 
sctp: fix invalid reference to the index variable of the iterator
Now in sctp_apply_peer_addr_params(), if SPP_IPV6_FLOWLABEL flag is set
and trans is NULL, it would use trans as the index variable to traverse
transport_addr_list, then trans is set as the last transport of it.
Later, if SPP_DSCP flag is set, it would enter into the wrong branch as
trans is actually an invalid reference.
So fix it by using a new index variable to traverse transport_addr_list
for both SPP_DSCP and SPP_IPV6_FLOWLABEL flags process.
Fixes: 0b0dce7a36fb ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Mikhaylov [Mon, 3 Sep 2018 07:26:28 +0000 (10:26 +0300)]
 
net/ibm/emac: wrong emac_calc_base call was used by typo
__emac_calc_base_mr1 was used instead of __emac4_calc_base_mr1
by copy-paste mistake for emac4syn.
Fixes: 45d6e545505fd32edb812f085be7de45b6a5c0af ("net/ibm/emac: add 8192 rx/tx fifo size")
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>