提交 · 4ccb5c7231295a5666f4df1c08232bdf7975e0e8 · openanolis / cloud-kernel

14 2月, 2017 21 次提交

N
btrfs: Make btrfs_kill_delayed_inode_items take btrfs_inode · 4ccb5c72
由 Nikolay Borisov 提交于 1月 10, 2017
```
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
4ccb5c72
N
btrfs: Make btrfs_delayed_delete_inode_ref take btrfs_inode · e07222c7
由 Nikolay Borisov 提交于 1月 10, 2017
```
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
e07222c7
N
btrfs: Make btrfs_delete_delayed_dir_index take btrfs_inode · e67bbbb9
由 Nikolay Borisov 提交于 1月 10, 2017
```
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
e67bbbb9
N
btrfs: Make btrfs_insert_delayed_dir_index take btrfs_inode · 6f45d185
由 Nikolay Borisov 提交于 1月 10, 2017
```
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
6f45d185
N
btrfs: Make btrfs_delayed_inode_reserve_metadata take btrfs_inode · fcabdd1c
由 Nikolay Borisov 提交于 1月 10, 2017
```
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
fcabdd1c
N
btrfs: Make btrfs_get_or_create_delayed_node take btrfs_inode · e5517a7b
由 Nikolay Borisov 提交于 1月 10, 2017
```
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
e5517a7b

btrfs: Make btrfs_get_delayed_node take btrfs_inode · 340c6ca9

由 Nikolay Borisov 提交于 1月 10, 2017

This function is internal to btrfs and doesn't really deal with any
VFS members, as such it needn't take a struct inode refrence but
btrfs_inode.
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

340c6ca9

btrfs: Make btrfs_ino take a struct btrfs_inode · 4a0cc7ca

由 Nikolay Borisov 提交于 1月 10, 2017

Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4a0cc7ca

btrfs: add wrapper for counting BTRFS_MAX_EXTENT_SIZE · 823bb20a

由 David Sterba 提交于 1月 04, 2017

The expression is open-coded in several places, this asks for a wrapper.
As we know the MAX_EXTENT fits to u32, we can use the appropirate
division helper. This cascades to the result type updates.

Compiler is clever enough to use shift instead of integer division, so
there's no change in the generated assembly.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

823bb20a

btrfs: remove unused logic of limiting async delalloc pages · 95995dbb

由 David Sterba 提交于 1月 06, 2017

A proposed patch in https://marc.info/?l=linux-btrfs&m=147859791003837
pointed out bad limit threshold in cow_file_range_async, but it turned
out that the whole logic is not necessary and is done by writeback. We
agreed to remove it.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

95995dbb

btrfs: consolidate auto defrag kick off policies · 26d30f85

由 Anand Jain 提交于 12月 19, 2016

As of now writes smaller than 64k for non compressed extents and 16k
for compressed extents inside eof are considered as candidate
for auto defrag, put them together at a place.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

26d30f85

btrfs: btrfs_defrag_root() doesn't defrag extent root tree · 8c3e6b1f

由 Anand Jain 提交于 12月 21, 2016

Since btrfs_defrag_leaves() does not support extent_root, remove its
corresponding call. The user can use the file based defrag to defrag
extents as of now.

No change in behaviour as extent_root is explicitly skipped in
btrfs_defrag_leaves and this has never worked as expected.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ ehnance changelong ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8c3e6b1f

btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref · fef394f7

由 Jeff Mahoney 提交于 12月 13, 2016

btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fef394f7

btrfs: remove redundant inode null check · 694a0dee

由 Colin Ian King 提交于 12月 20, 2016

The check for a null inode is redundant since the function
is a callback for exportfs, which will itself crash if
dentry->d_inode or parent->d_inode is NULL.  Removing the
null check makes this consistent with other file systems.

Also remove the redundant null dir check too.

Found with static analysis by CoverityScan, CID 1389472

Kudos to Jeff Mahoney for reviewing and explaining the error in
my original patch (most of this explanation went into the above
commit message) and David Sterba for pointing out that the dir
check is also redundant.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

694a0dee

Btrfs: ACCESS_ONCE cleanup · 20c7bcec

由 Seraphime Kirkovski 提交于 12月 15, 2016

This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros
Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

20c7bcec

Btrfs: code cleanup min/max -> min_t/max_t · 50d0446e

由 Seraphime Kirkovski 提交于 12月 15, 2016

This cleans up the cases where the min/max macros were used with a cast
rather than using directly min_t/max_t.
Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

50d0446e

btrfs: use rb_entry() instead of container_of · 6b4df8b6

由 Geliang Tang 提交于 12月 19, 2016

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6b4df8b6

A
btrfs: use BTRFS_COMPRESS_NONE to specify no compression · f74670f7
由 Anand Jain 提交于 12月 06, 2016
```
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
f74670f7

btrfs: drop gfp mask tweaking in try_release_extent_state · 1aceabf3

由 Michal Hocko 提交于 1月 09, 2017

try_release_extent_state reduces the gfp mask to GFP_NOFS if it is
compatible. This is true for GFP_KERNEL as well. There is no real
reason to do that though. There is no new lock taken down the
the only consumer of the gfp mask which is
try_release_extent_state
  clear_extent_bit
    __clear_extent_bit
      alloc_extent_state

So this seems just unnecessary and confusing.
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1aceabf3

btrfs: fix up misleading GFP_NOFS usage in btrfs_releasepage · 3ba7ab22

由 Michal Hocko 提交于 1月 09, 2017

b335b003 ("Btrfs: Avoid using __GFP_HIGHMEM with slab allocator")
has reduced the allocation mask in btrfs_releasepage to GFP_NOFS just
to prevent from giving an unappropriate gfp mask to the slab allocator
deeper down the callchain (in alloc_extent_state). This is wrong for
two reasons a) GFP_NOFS might be just too restrictive for the calling
context b) it is better to tweak the gfp mask down when it needs that.

So just remove the mask tweaking from btrfs_releasepage and move it
down to alloc_extent_state where it is needed.
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3ba7ab22

btrfs: Add WARN_ON for qgroup reserved underflow · 18dc22c1

由 Qu Wenruo 提交于 10月 20, 2016

Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs
qgroup reserved space, and leads to non-writable fs.

This reminds us that we don't have enough underflow check for qgroup
reserved space.

For underflow case, we should not really underflow the numbers but warn
and keeps qgroup still work.

So add more check on qgroup reserved space and add WARN_ON() and
btrfs_warn() for any underflow case.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

18dc22c1

11 2月, 2017 1 次提交

Btrfs: fix btrfs_decompress_buf2page() · 6e78b3f7

由 Omar Sandoval 提交于 2月 10, 2017

If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

    buf_start = 42769
    total_out = 46865
    working_bytes = total_out - buf_start = 4096
    start_byte = 45056

The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:

    buf_offset = start_byte - buf_start = 2287
    working_bytes -= buf_offset = 1809
    current_buf_start = buf_start = 42769

Then, we copy

    bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
    buf_offset += bytes = 4096
    working_bytes -= bytes = 0
    current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:

    buf_offset = start_byte - buf_start = 2287
    working_bytes = total_out - start_byte = 1809
    current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.

Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers")
Reported-by: NPat Erley <pat-lkml@erley.org>
Reviewed-by: NChris Mason <clm@fb.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Tested-by: NLiu Bo <bo.li.liu@oracle.com>

6e78b3f7

10 2月, 2017 2 次提交

nfsd: Revert "nfsd: special case truncates some more" · 0839ffb8

由 J. Bruce Fields 提交于 2月 09, 2017

This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate.  We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0839ffb8

pstore: don't OOPS when there are no ftrace zones · 8672aed7

由 Brian Norris 提交于 2月 08, 2017

We'll OOPS in ramoops_get_next_prz() if the platform didn't ask for any
ftrace zones (i.e., cxt->fprzs will be NULL). Let's just skip this
entire FTRACE section if there's no 'fprzs'.

Regression seen on a coreboot/depthcharge-based Chromebook.

Fixes: 2fbea82b ("pstore: Merge per-CPU ftrace records into one")
Cc: Joel Fernandes <joelaf@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NBrian Norris <briannorris@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

8672aed7

09 2月, 2017 1 次提交

btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls · 2a362249

由 Jeff Mahoney 提交于 2月 06, 2017

Commit 4c63c245 incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called.  The ->compat_ioctl callback is
expected to handle all ioctls, not just compat variants.  As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.

Fixes: 4c63c245 ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2a362249

08 2月, 2017 1 次提交

mm: fix KPF_SWAPCACHE in /proc/kpageflags · b6789123

由 Hugh Dickins 提交于 2月 07, 2017

Commit 6326fec1 ("mm: Use owner_priv bit for PageSwapCache, valid
when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
depending on PageSwapBacked being true).

As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
synthesized, instead of being shown on unrelated pages which just happen
to have PG_owner_priv_1 set.
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6789123

04 2月, 2017 1 次提交

fs: break out of iomap_file_buffered_write on fatal signals · d1908f52

由 Michal Hocko 提交于 2月 03, 2017

Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion.  He has tracked
this down to the following path

	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier.  But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request.  We need to
check for fatal signals and back off with a short write instead.

As the iomap_apply delegates all the work down to the actor we have to
hook into those.  All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there.  dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly.  Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.

Fixes: 68a9f5e7 ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d1908f52

01 2月, 2017 5 次提交

fscache: Fix dead object requeue · e26bfebd

由 David Howells 提交于 1月 31, 2017

Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).

The way this comes about is something like the following:

 (1) The object dispatcher is processing a work state for an object.  This
     is done in workqueue context.

 (2) An out-of-band event comes in that isn't masked, causing the object to
     be queued, say EV_KILL.

 (3) The object dispatcher finishes processing the current work state on
     that object and then sees there's another event to process, so,
     without returning to the workqueue core, it processes that event too.
     It then follows the chain of events that initiates until we reach
     OBJECT_DEAD without going through a wait state (such as
     WAIT_FOR_CLEARANCE).

     At this point, object->events may be 0, object->event_mask will be 0
     and oob_event_mask will be 0.

 (4) The object dispatcher returns to the workqueue processor, and in due
     course, this sees that the object's work item is still queued and
     invokes it again.

 (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
     jumps to it - resulting in an OOPS.

When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).

The window for (2) is very small:

 (A) object->event_mask is cleared whilst the event dispatch process is
     underway - though there's no memory barrier to force this to the top
     of the function.

     The window, therefore is from the time the object was selected by the
     workqueue processor and made requeueable to the time the mask was
     cleared.

 (B) fscache_raise_event() will only queue the object if it manages to set
     the event bit and the corresponding event_mask bit was set.

     The enqueuement is then deferred slightly whilst we get a ref on the
     object and get the per-CPU variable for workqueue congestion.  This
     slight deferral slightly increases the probability by allowing extra
     time for the workqueue to make the item requeueable.

Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2.  The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.

If this race occurs, an oops similar to the following is seen (note the RIP
value):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<0000000000000002>] 0x1
PGD 0
Oops: 0010 [#1] SMP
Modules linked in: ...
CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
RSP: 0018:ffff880717547df8  EFLAGS: 00010202
RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
Call Trace:
 [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff8109d5db>] process_one_work+0x17b/0x470
 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
 [<ffffffff810a5acf>] kthread+0xcf/0xe0
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff816460d8>] ret_from_fork+0x58/0x90
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJeremy McNicoll <jeremymc@redhat.com>
Tested-by: NFrank Sorenson <sorenson@redhat.com>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e26bfebd

fscache: Clear outstanding writes when disabling a cookie · 6bdded59

由 David Howells 提交于 1月 18, 2017

fscache_disable_cookie() needs to clear the outstanding writes on the
cookie it's disabling because they cannot be completed after.

Without this, fscache_nfs_open_file() gets stuck because it disables the
cookie when the file is opened for writing but can't uncache the pages till
afterwards - otherwise there's a race between the open routine and anyone
who already has it open R/O and is still reading from it.

Looking in /proc/pid/stack of the offending process shows:

[<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache]
[<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
[<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs]
[<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4]
[<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7
[<ffffffff811743ac>] vfs_open+0x5c/0x65
[<ffffffff81184185>] path_openat+0x785/0x8fb
[<ffffffff81184343>] do_filp_open+0x48/0x9e
[<ffffffff81174710>] do_sys_open+0x13b/0x1cb
[<ffffffff811747b9>] SyS_open+0x19/0x1b
[<ffffffff81001c44>] do_syscall_64+0x80/0x17a
[<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
Reported-by: NJianhong Yin <jiyin@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6bdded59

FS-Cache: Initialise stores_lock in netfs cookie · 62deb818

由 David Howells 提交于 1月 18, 2017

Initialise the stores_lock in fscache netfs cookies.  Technically, it
shouldn't be necessary, since the netfs cookie is an index and stores no
data, but initialising it anyway adds insignificant overhead.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

62deb818

nfsd: special case truncates some more · 41f53350

由 Christoph Hellwig 提交于 1月 24, 2017

Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributs to set to set various file attributes including the
file size and the uid/gid.

The Linux syscalls never mixes size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates might not update random other attributes, and many other
file systems handle the case but do not update the different attributes
in the same transaction. NFSD on the other hand passes the attributes
it gets on the wire more or less directly through to the VFS, leading to
updates the file systems don't expect. XFS at least has an assert on
the allowed attributes, which caught an unusual NFS client setting the
size and group at the same time.

To handle this issue properly this switches nfsd to call vfs_truncate
for size changes, and then handle all other attributes through
notify_change. As a side effect this also means less boilerplace code
around the size change as we can now reuse the VFS code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

41f53350

NFSD: Fix a null reference case in find_or_create_lock_stateid() · d19fb70d

由 Kinglong Mee 提交于 1月 18, 2017

nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Cc: stable@vger.kernel.org
Fixes: 356a95ec "nfsd: clean up races in lock stateid searching..."
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d19fb70d

28 1月, 2017 1 次提交

xfs: prevent quotacheck from overloading inode lru · e0d76fa4

由 Brian Foster 提交于 1月 26, 2017

Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.

Commit 17c12bcd ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release.  The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.

Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.

CC: stable@vger.kernel.org # v4.9
Reported-by: NMartin Svec <martin.svec@zoner.cz>
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e0d76fa4

27 1月, 2017 6 次提交

Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations · 57b59ed2

由 Omar Sandoval 提交于 1月 25, 2017

Subvolume directory inodes can't have ACLs.

Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

57b59ed2

Btrfs: disable xattr operations on subvolume directories · 1fdf4194

由 Omar Sandoval 提交于 1月 25, 2017

When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.

To fix this, clear IOP_XATTR in ->i_opflags on these inodes.

Fixes: 6c6ef9f2 ("xattr: Stop calling {get,set,remove}xattr inode operations")
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Reported-by: NChris Murphy <lists@colorremedies.com>
Tested-by: NChris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

1fdf4194

Btrfs: remove old tree_root case in btrfs_read_locked_inode() · 67ade058

由 Omar Sandoval 提交于 1月 25, 2017

As Jeff explained in c2951f32 ("btrfs: remove old tree_root dirent
processing in btrfs_real_readdir()"), supporting this old format is no
longer necessary since the Btrfs magic number has been updated since we
changed to the current format. There are other places where we still
handle this old format, but since this is part of a fix that is going to
stable, I'm only removing this one for now.

Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

67ade058

pNFS: Fix a reference leak in _pnfs_return_layout · ee6625a9

由 Trond Myklebust 提交于 1月 26, 2017

IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit
without freeing the list of invalidated layout segments, leading
to a reference leak.
Reported-by: NOlga Kornievskaia <aglo@umich.edu>
Fixes: 24408f52 ("pNFS: Fix bugs in _pnfs_return_layout")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ee6625a9

nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED" · 406dab84

由 Chuck Lever 提交于 1月 26, 2017

Lock sequence IDs are bumped in decode_lock by calling
nfs_increment_seqid(). nfs_increment_sequid() does not use the
seqid_mutating_err() function fixed in commit 059aa734 ("Don't
increment lock sequence ID after NFS4ERR_MOVED").

Fixes: 059aa734 ("Don't increment lock sequence ID after ...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NXuan Qi <xuan.qi@oracle.com>
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

406dab84

xfs: fix bmv_count confusion w/ shared extents · c364b6d0

由 Darrick J. Wong 提交于 1月 26, 2017

In a bmapx call, bmv_count is the total size of the array, including the
zeroth element that userspace uses to supply the search key. The output
array starts at offset 1 so that we can set up the user for the next
invocation. Since we now can split an extent into multiple bmap records
due to shared/unshared status, we have to be careful that we don't
overflow the output array.

In the original patch f86f4037 ("xfs: teach get_bmapx about shared
extents and the CoW fork") I used cur_ext (the output index) to check
for overflows, albeit with an off-by-one error. Since nexleft no longer
describes the number of unfilled slots in the output, we can rip all
that out and use cur_ext for the overflow check directly.

Failure to do this causes heap corruption in bmapx callers such as
xfs_io and xfs_scrub. xfs/328 can reproduce this problem.
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c364b6d0

26 1月, 2017 1 次提交

xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b

由 Darrick J. Wong 提交于 1月 25, 2017

If we try to allocate memory pages to back an xfs_buf that we're trying
to read, it's possible that we'll be so short on memory that the page
allocation fails.  For a blocking read we'll just wait, but for
readahead we simply dump all the pages we've collected so far.

Unfortunately, after dumping the pages we neglect to clear the
_XBF_PAGES state, which means that the subsequent call to xfs_buf_free
thinks that b_pages still points to pages we own.  It then double-frees
the b_pages pages.

This results in screaming about negative page refcounts from the memory
manager, which xfs oughtn't be triggering.  To reproduce this case,
mount a filesystem where the size of the inodes far outweighs the
availalble memory (a ~500M inode filesystem on a VM with 300MB memory
did the trick here) and run bulkstat in parallel with other memory
eating processes to put a huge load on the system.  The "check summary"
phase of xfs_scrub also works for this purpose.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

2aa6ba7b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功