- 14 2月, 2017 21 次提交
-
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This function is internal to btrfs and doesn't really deal with any VFS members, as such it needn't take a struct inode refrence but btrfs_inode. Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently btrfs_ino takes a struct inode and this causes a lot of internal btrfs functions which consume this ino to take a VFS inode, rather than btrfs' own struct btrfs_inode. In order to fix this "leak" of VFS structs into the internals of btrfs first it's necessary to eliminate all uses of struct inode for the purpose of inode. This patch does that by using BTRFS_I to convert an inode to btrfs_inode. With this problem eliminated subsequent patches will start eliminating the passing of struct inode altogether, eventually resulting in a lot cleaner code. Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> [ fix btrfs_get_extent tracepoint prototype ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The expression is open-coded in several places, this asks for a wrapper. As we know the MAX_EXTENT fits to u32, we can use the appropirate division helper. This cascades to the result type updates. Compiler is clever enough to use shift instead of integer division, so there's no change in the generated assembly. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
A proposed patch in https://marc.info/?l=linux-btrfs&m=147859791003837 pointed out bad limit threshold in cow_file_range_async, but it turned out that the whole logic is not necessary and is done by writeback. We agreed to remove it. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
As of now writes smaller than 64k for non compressed extents and 16k for compressed extents inside eof are considered as candidate for auto defrag, put them together at a place. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Since btrfs_defrag_leaves() does not support extent_root, remove its corresponding call. The user can use the file based defrag to defrag extents as of now. No change in behaviour as extent_root is explicitly skipped in btrfs_defrag_leaves and this has never worked as expected. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ ehnance changelong ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
btrfs_add_delayed_data_ref is always called with a NULL extent_op, so let's drop the argument. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Colin Ian King 提交于
The check for a null inode is redundant since the function is a callback for exportfs, which will itself crash if dentry->d_inode or parent->d_inode is NULL. Removing the null check makes this consistent with other file systems. Also remove the redundant null dir check too. Found with static analysis by CoverityScan, CID 1389472 Kudos to Jeff Mahoney for reviewing and explaining the error in my original patch (most of this explanation went into the above commit message) and David Sterba for pointing out that the dir check is also redundant. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Seraphime Kirkovski 提交于
This replaces ACCESS_ONCE macro with the corresponding READ|WRITE macros Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Seraphime Kirkovski 提交于
This cleans up the cases where the min/max macros were used with a cast rather than using directly min_t/max_t. Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Geliang Tang 提交于
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Michal Hocko 提交于
try_release_extent_state reduces the gfp mask to GFP_NOFS if it is compatible. This is true for GFP_KERNEL as well. There is no real reason to do that though. There is no new lock taken down the the only consumer of the gfp mask which is try_release_extent_state clear_extent_bit __clear_extent_bit alloc_extent_state So this seems just unnecessary and confusing. Signed-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Michal Hocko 提交于
b335b003 ("Btrfs: Avoid using __GFP_HIGHMEM with slab allocator") has reduced the allocation mask in btrfs_releasepage to GFP_NOFS just to prevent from giving an unappropriate gfp mask to the slab allocator deeper down the callchain (in alloc_extent_state). This is wrong for two reasons a) GFP_NOFS might be just too restrictive for the calling context b) it is better to tweak the gfp mask down when it needs that. So just remove the mask tweaking from btrfs_releasepage and move it down to alloc_extent_state where it is needed. Signed-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs qgroup reserved space, and leads to non-writable fs. This reminds us that we don't have enough underflow check for qgroup reserved space. For underflow case, we should not really underflow the numbers but warn and keeps qgroup still work. So add more check on qgroup reserved space and add WARN_ON() and btrfs_warn() for any underflow case. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 11 2月, 2017 1 次提交
-
-
由 Omar Sandoval 提交于
If btrfs_decompress_buf2page() is handed a bio with its page in the middle of the working buffer, then we adjust the offset into the working buffer. After we copy into the bio, we advance the iterator by the number of bytes we copied. Then, we have some logic to handle the case of discontiguous pages and adjust the offset into the working buffer again. However, if we didn't advance the bio to a new page, we may enter this case in error, essentially repeating the adjustment that we already made when we entered the function. The end result is bogus data in the bio. Previously, we only checked for this case when we advanced to a new page, but the conversion to bio iterators changed that. This restores the old, correct behavior. A case I saw when testing with zlib was: buf_start = 42769 total_out = 46865 working_bytes = total_out - buf_start = 4096 start_byte = 45056 The condition (total_out > start_byte && buf_start < start_byte) is true, so we adjust the offset: buf_offset = start_byte - buf_start = 2287 working_bytes -= buf_offset = 1809 current_buf_start = buf_start = 42769 Then, we copy bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809 buf_offset += bytes = 4096 working_bytes -= bytes = 0 current_buf_start += bytes = 44578 After bio_advance(), we are still in the same page, so start_byte is the same. Then, we check (total_out > start_byte && current_buf_start < start_byte), which is true! So, we adjust the values again: buf_offset = start_byte - buf_start = 2287 working_bytes = total_out - start_byte = 1809 current_buf_start = buf_start + buf_offset = 45056 But note that working_bytes was already zero before this, so we should have stopped copying. Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers") Reported-by: NPat Erley <pat-lkml@erley.org> Reviewed-by: NChris Mason <clm@fb.com> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChris Mason <clm@fb.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Tested-by: NLiu Bo <bo.li.liu@oracle.com>
-
- 10 2月, 2017 2 次提交
-
-
由 J. Bruce Fields 提交于
This patch incorrectly attempted nested mnt_want_write, and incorrectly disabled nfsd's owner override for truncate. We'll fix those problems and make another attempt soon, for the moment I think the safest is to revert. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Brian Norris 提交于
We'll OOPS in ramoops_get_next_prz() if the platform didn't ask for any ftrace zones (i.e., cxt->fprzs will be NULL). Let's just skip this entire FTRACE section if there's no 'fprzs'. Regression seen on a coreboot/depthcharge-based Chromebook. Fixes: 2fbea82b ("pstore: Merge per-CPU ftrace records into one") Cc: Joel Fernandes <joelaf@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NBrian Norris <briannorris@chromium.org> Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 09 2月, 2017 1 次提交
-
-
由 Jeff Mahoney 提交于
Commit 4c63c245 incorrectly assumed that returning -ENOIOCTLCMD would cause the native ioctl to be called. The ->compat_ioctl callback is expected to handle all ioctls, not just compat variants. As a result, when using 32-bit userspace on 64-bit kernels, everything except those three ioctls would return -ENOTTY. Fixes: 4c63c245 ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl") Cc: stable@vger.kernel.org Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 08 2月, 2017 1 次提交
-
-
由 Hugh Dickins 提交于
Commit 6326fec1 ("mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and depending on PageSwapBacked being true). As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be synthesized, instead of being shown on unrelated pages which just happen to have PG_owner_priv_1 set. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 2月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
Tetsuo has noticed that an OOM stress test which performs large write requests can cause the full memory reserves depletion. He has tracked this down to the following path __alloc_pages_nodemask+0x436/0x4d0 alloc_pages_current+0x97/0x1b0 __page_cache_alloc+0x15d/0x1a0 mm/filemap.c:728 pagecache_get_page+0x5a/0x2b0 mm/filemap.c:1331 grab_cache_page_write_begin+0x23/0x40 mm/filemap.c:2773 iomap_write_begin+0x50/0xd0 fs/iomap.c:118 iomap_write_actor+0xb5/0x1a0 fs/iomap.c:190 ? iomap_write_end+0x80/0x80 fs/iomap.c:150 iomap_apply+0xb3/0x130 fs/iomap.c:79 iomap_file_buffered_write+0x68/0xa0 fs/iomap.c:243 ? iomap_write_end+0x80/0x80 xfs_file_buffered_aio_write+0x132/0x390 [xfs] ? remove_wait_queue+0x59/0x60 xfs_file_write_iter+0x90/0x130 [xfs] __vfs_write+0xe5/0x140 vfs_write+0xc7/0x1f0 ? syscall_trace_enter+0x1d0/0x380 SyS_write+0x58/0xc0 do_syscall_64+0x6c/0x200 entry_SYSCALL64_slow_path+0x25/0x25 the oom victim has access to all memory reserves to make a forward progress to exit easier. But iomap_file_buffered_write and other callers of iomap_apply loop to complete the full request. We need to check for fatal signals and back off with a short write instead. As the iomap_apply delegates all the work down to the actor we have to hook into those. All callers that work with the page cache are calling iomap_write_begin so we will check for signals there. dax_iomap_actor has to handle the situation explicitly because it copies data to the userspace directly. Other callers like iomap_page_mkwrite work on a single page or iomap_fiemap_actor do not allocate memory based on the given len. Fixes: 68a9f5e7 ("xfs: implement iomap based buffered write path") Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2017 5 次提交
-
-
由 David Howells 提交于
Under some circumstances, an fscache object can become queued such that it fscache_object_work_func() can be called once the object is in the OBJECT_DEAD state. This results in the kernel oopsing when it tries to invoke the handler for the state (which is hard coded to 0x2). The way this comes about is something like the following: (1) The object dispatcher is processing a work state for an object. This is done in workqueue context. (2) An out-of-band event comes in that isn't masked, causing the object to be queued, say EV_KILL. (3) The object dispatcher finishes processing the current work state on that object and then sees there's another event to process, so, without returning to the workqueue core, it processes that event too. It then follows the chain of events that initiates until we reach OBJECT_DEAD without going through a wait state (such as WAIT_FOR_CLEARANCE). At this point, object->events may be 0, object->event_mask will be 0 and oob_event_mask will be 0. (4) The object dispatcher returns to the workqueue processor, and in due course, this sees that the object's work item is still queued and invokes it again. (5) The current state is a work state (OBJECT_DEAD), so the dispatcher jumps to it - resulting in an OOPS. When I'm seeing this, the work state in (1) appears to have been either LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is fscache_osm_lookup_oob). The window for (2) is very small: (A) object->event_mask is cleared whilst the event dispatch process is underway - though there's no memory barrier to force this to the top of the function. The window, therefore is from the time the object was selected by the workqueue processor and made requeueable to the time the mask was cleared. (B) fscache_raise_event() will only queue the object if it manages to set the event bit and the corresponding event_mask bit was set. The enqueuement is then deferred slightly whilst we get a ref on the object and get the per-CPU variable for workqueue congestion. This slight deferral slightly increases the probability by allowing extra time for the workqueue to make the item requeueable. Handle this by giving the dead state a processor function and checking the for the dead state address rather than seeing if the processor function is address 0x2. The dead state processor function can then set a flag to indicate that it's occurred and give a warning if it occurs more than once per object. If this race occurs, an oops similar to the following is seen (note the RIP value): BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 IP: [<0000000000000002>] 0x1 PGD 0 Oops: 0010 [#1] SMP Modules linked in: ... CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015 Workqueue: fscache_object fscache_object_work_func [fscache] task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000 RIP: 0010:[<0000000000000002>] [<0000000000000002>] 0x1 RSP: 0018:ffff880717547df8 EFLAGS: 00010202 RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200 RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480 RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510 R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000 R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600 FS: 0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900 Call Trace: [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5acf>] kthread+0xcf/0xe0 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [<ffffffff816460d8>] ret_from_fork+0x58/0x90 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeremy McNicoll <jeremymc@redhat.com> Tested-by: NFrank Sorenson <sorenson@redhat.com> Tested-by: NBenjamin Coddington <bcodding@redhat.com> Reviewed-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
fscache_disable_cookie() needs to clear the outstanding writes on the cookie it's disabling because they cannot be completed after. Without this, fscache_nfs_open_file() gets stuck because it disables the cookie when the file is opened for writing but can't uncache the pages till afterwards - otherwise there's a race between the open routine and anyone who already has it open R/O and is still reading from it. Looking in /proc/pid/stack of the offending process shows: [<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache] [<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache] [<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs] [<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4] [<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7 [<ffffffff811743ac>] vfs_open+0x5c/0x65 [<ffffffff81184185>] path_openat+0x785/0x8fb [<ffffffff81184343>] do_filp_open+0x48/0x9e [<ffffffff81174710>] do_sys_open+0x13b/0x1cb [<ffffffff811747b9>] SyS_open+0x19/0x1b [<ffffffff81001c44>] do_syscall_64+0x80/0x17a [<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a [<ffffffffffffffff>] 0xffffffffffffffff Reported-by: NJianhong Yin <jiyin@redhat.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Initialise the stores_lock in fscache netfs cookies. Technically, it shouldn't be necessary, since the netfs cookie is an index and stores no data, but initialising it anyway adds insignificant overhead. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Both the NFS protocols and the Linux VFS use a setattr operation with a bitmap of attributs to set to set various file attributes including the file size and the uid/gid. The Linux syscalls never mixes size updates with unrelated updates like the uid/gid, and some file systems like XFS and GFS2 rely on the fact that truncates might not update random other attributes, and many other file systems handle the case but do not update the different attributes in the same transaction. NFSD on the other hand passes the attributes it gets on the wire more or less directly through to the VFS, leading to updates the file systems don't expect. XFS at least has an assert on the allowed attributes, which caught an unusual NFS client setting the size and group at the same time. To handle this issue properly this switches nfsd to call vfs_truncate for size changes, and then handle all other attributes through notify_change. As a side effect this also means less boilerplace code around the size change as we can now reuse the VFS code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid(). If nfsd doesn't go through init_lock_stateid() and put stateid at end, there is a NULL reference to .sc_free when calling nfs4_put_stid(ns). This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid(). Cc: stable@vger.kernel.org Fixes: 356a95ec "nfsd: clean up races in lock stateid searching..." Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 28 1月, 2017 1 次提交
-
-
由 Brian Foster 提交于
Quotacheck runs at mount time in situations where quota accounting must be recalculated. In doing so, it uses bulkstat to visit every inode in the filesystem. Historically, every inode processed during quotacheck was released and immediately tagged for reclaim because quotacheck runs before the superblock is marked active by the VFS. In other words, the final iput() lead to an immediate ->destroy_inode() call, which allowed the XFS background reclaim worker to start reclaiming inodes. Commit 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") marks the XFS superblock active sooner as part of the mount process to support caching inodes processed during log recovery. This occurs before quotacheck and thus means all inodes processed by quotacheck are inserted to the LRU on release. The s_umount lock is held until the mount has completed and thus prevents the shrinkers from operating on the sb. This means that quotacheck can excessively populate the inode LRU and lead to OOM conditions on systems without sufficient RAM. Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on inodes processed by quotacheck. This causes ->drop_inode() to return 1 and in turn causes iput_final() to evict the inode. This preserves the original quotacheck behavior and prevents it from overloading the LRU and running out of memory. CC: stable@vger.kernel.org # v4.9 Reported-by: NMartin Svec <martin.svec@zoner.cz> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 27 1月, 2017 6 次提交
-
-
由 Omar Sandoval 提交于
Subvolume directory inodes can't have ACLs. Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
When you snapshot a subvolume containing a subvolume, you get a placeholder directory where the subvolume would be. These directory inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously, these i_ops didn't include the xattr operation callbacks. The conversion to xattr_handlers missed this case, leading to bogus attempts to set xattrs on these inodes. This manifested itself as failures when running delayed inodes. To fix this, clear IOP_XATTR in ->i_opflags on these inodes. Fixes: 6c6ef9f2 ("xattr: Stop calling {get,set,remove}xattr inode operations") Cc: Andreas Gruenbacher <agruenba@redhat.com> Reported-by: NChris Murphy <lists@colorremedies.com> Tested-by: NChris Murphy <lists@colorremedies.com> Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
As Jeff explained in c2951f32 ("btrfs: remove old tree_root dirent processing in btrfs_real_readdir()"), supporting this old format is no longer necessary since the Btrfs magic number has been updated since we changed to the current format. There are other places where we still handle this old format, but since this is part of a fix that is going to stable, I'm only removing this one for now. Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Trond Myklebust 提交于
IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit without freeing the list of invalidated layout segments, leading to a reference leak. Reported-by: NOlga Kornievskaia <aglo@umich.edu> Fixes: 24408f52 ("pNFS: Fix bugs in _pnfs_return_layout") Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Chuck Lever 提交于
Lock sequence IDs are bumped in decode_lock by calling nfs_increment_seqid(). nfs_increment_sequid() does not use the seqid_mutating_err() function fixed in commit 059aa734 ("Don't increment lock sequence ID after NFS4ERR_MOVED"). Fixes: 059aa734 ("Don't increment lock sequence ID after ...") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NXuan Qi <xuan.qi@oracle.com> Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Darrick J. Wong 提交于
In a bmapx call, bmv_count is the total size of the array, including the zeroth element that userspace uses to supply the search key. The output array starts at offset 1 so that we can set up the user for the next invocation. Since we now can split an extent into multiple bmap records due to shared/unshared status, we have to be careful that we don't overflow the output array. In the original patch f86f4037 ("xfs: teach get_bmapx about shared extents and the CoW fork") I used cur_ext (the output index) to check for overflows, albeit with an off-by-one error. Since nexleft no longer describes the number of unfilled slots in the output, we can rip all that out and use cur_ext for the overflow check directly. Failure to do this causes heap corruption in bmapx callers such as xfs_io and xfs_scrub. xfs/328 can reproduce this problem. Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 26 1月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
If we try to allocate memory pages to back an xfs_buf that we're trying to read, it's possible that we'll be so short on memory that the page allocation fails. For a blocking read we'll just wait, but for readahead we simply dump all the pages we've collected so far. Unfortunately, after dumping the pages we neglect to clear the _XBF_PAGES state, which means that the subsequent call to xfs_buf_free thinks that b_pages still points to pages we own. It then double-frees the b_pages pages. This results in screaming about negative page refcounts from the memory manager, which xfs oughtn't be triggering. To reproduce this case, mount a filesystem where the size of the inodes far outweighs the availalble memory (a ~500M inode filesystem on a VM with 300MB memory did the trick here) and run bulkstat in parallel with other memory eating processes to put a huge load on the system. The "check summary" phase of xfs_scrub also works for this purpose. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-