- 27 7月, 2016 9 次提交
-
-
由 Dave Chinner 提交于
wait_sb_inodes() currently does a walk of all inodes in the filesystem to find dirty one to wait on during sync. This is highly inefficient and wastes a lot of CPU when there are lots of clean cached inodes that we don't need to wait on. To avoid this "all inode" walk, we need to track inodes that are currently under writeback that we need to wait for. We do this by adding inodes to a writeback list on the sb when the mapping is first tagged as having pages under writeback. wait_sb_inodes() can then walk this list of "inodes under IO" and wait specifically just for the inodes that the current sync(2) needs to wait for. Define a couple helpers to add/remove an inode from the writeback list and call them when the overall mapping is tagged for or cleared from writeback. Update wait_sb_inodes() to walk only the inodes under writeback due to the sync. With this change, filesystem sync times are significantly reduced for fs' with largely populated inode caches and otherwise no other work to do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem with a ~10m entry inode cache, sync times are reduced from ~7.3s to less than 0.1s when the filesystem is fully clean. Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.comSigned-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Tested-by: NHolger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
Clean up unnecessary assignment for 'ret'. Link: http://lkml.kernel.org/r/578C61F6.4080403@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
These BUG_ON(!inode) are obscure because we have already used inode to get osb. And actually we can guarantee here inode is valid in the context. So we can safely remove them. Link: http://lkml.kernel.org/r/5776336A.6030104@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NEric Ren <zren@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Several prototypes in inode.h are just defined but not actually implemented and used, so remove them. Link: http://lkml.kernel.org/r/57763787.4020706@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
dlm_debug_ctxt->debug_refcnt is initialized to 1 and then increased to 2 by dlm_debug_get in dlm_debug_init. But dlm_debug_put is called only once in dlm_debug_shutdown during unregister dlm, which leads to dlm_debug_ctxt leaked. Link: http://lkml.kernel.org/r/577BB755.4030900@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
The last goto is unneeded, so remove it. Link: http://lkml.kernel.org/r/576213D3.6080002@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
Journal replay will be run when performing recovery for a dead node. To avoid the stale cache impact, all blocks of dead node's journal inode were reloaded from disk. This hurts the performance. Check whether one block is cached before reloading it can improve performance a lot. In my test env, the time doing recovery was improved from 120s to 1s. [akpm@linux-foundation.org: clean up the for loop p_blkno handling] Link: http://lkml.kernel.org/r/1466155682-24656-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: "Gang He" <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Ren 提交于
Obviously, memset() has zeroed the whole struct locking_max_version. So, it's no need to zero its two fields individually. Link: http://lkml.kernel.org/r/1463970605-18354-1-git-send-email-zren@suse.comSigned-off-by: NEric Ren <zren@suse.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NGang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ross Zwisler 提交于
Remove the unused wrappers dax_fault() and dax_pmd_fault(). After this removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and dax_pmd_fault() respectively, and update all callers. The dax_fault() and dax_pmd_fault() wrappers were initially intended to capture some filesystem independent functionality around page faults (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime and ctime). However, the following commits: 5726b27b ("ext2: Add locking for DAX faults") ea3d7209 ("ext4: fix races between page faults and hole punching") added locking to the ext2 and ext4 filesystems after these common operations but before __dax_fault() and __dax_pmd_fault() were called. This means that these wrappers are no longer used, and are unlikely to be used in the future. XFS has had locking analogous to what was recently added to ext2 and ext4 since DAX support was initially introduced by: 6b698ede ("xfs: add DAX file operations support") Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 7月, 2016 2 次提交
-
-
由 Maxim Patlasov 提交于
The upper dentry may become stale before we call ovl_lock_rename_workdir. For example, someone could (mistakenly or maliciously) manually unlink(2) it directly from upperdir. To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir and and check if it matches the upper dentry. Essentially, it is the same problem and similar solution as in commit 11f37104 ("ovl: verify upper dentry before unlink and rename"). Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
-
由 Bob Peterson 提交于
Before this patch, if you used gfs2_jadd to add new journals of a size smaller than the existing journals, replaying those new journals would withdraw. That's because function gfs2_replay_incr_blk was using the number of journal blocks (jd_block) from the superblock's journal pointer. In other words, "My journal's max size" rather than "the journal we're replaying's size." This patch changes the function to use the size of the pertinent journal rather than always using the journal we happen to be using. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 16 7月, 2016 1 次提交
-
-
由 Jann Horn 提交于
Without this check, the following XFS_I invocations would return bad pointers when used on non-XFS inodes (perhaps pointers into preceding allocator chunks). This could be used by an attacker to trick xfs_swap_extents into performing locking operations on attacker-chosen structures in kernel memory, potentially leading to code execution in the kernel. (I have not investigated how likely this is to be usable for an attack in practice.) Signed-off-by: NJann Horn <jann@thejh.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2016 1 次提交
-
-
由 H.J. Lu 提交于
Don't use the same syscall numbers for 2 different syscalls: 534 x32 preadv compat_sys_preadv64 535 x32 pwritev compat_sys_pwritev64 534 x32 preadv2 compat_sys_preadv2 535 x32 pwritev2 compat_sys_pwritev2 Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset is passed in one 64-bit register on x32, similar to compat_sys_preadv64() and compat_sys_pwritev64(). Signed-off-by: NH.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 7月, 2016 1 次提交
-
-
由 Fengguang Wu 提交于
To fix super long dmesg error lines like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation rangeCHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation rangeswapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) After fix, it should look like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation range CHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation range swapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) Reported-by: NPhilip Li <philip.li@intel.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 7月, 2016 1 次提交
-
-
由 Bob Peterson 提交于
For the last process to close a file opened for write, function gfs2_rsqa_delete was deleting the file's inode's block reservation out of the rgrp reservations tree. Then it was checking to make sure rs_free was 0, but it was performing the check outside the protection of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed to prevent a race between the process freeing the reservation and another who is allocating a new set of blocks inside the same rgrp for the same inode, thus changing its value. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 08 7月, 2016 2 次提交
-
-
由 Jeff Mahoney 提交于
There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: NJeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
-
由 Jeff Mahoney 提交于
This reverts commit 2f36db71. It fixed a local root exploit but also introduced a dependency on the lower file system implementing an mmap operation just to open a file, which is a bit of a heavy hammer. The right fix is to have mmap depend on the existence of the mmap handler instead. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
-
- 06 7月, 2016 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
->atomic_open() can be given an in-lookup dentry *or* a negative one found in dcache. Use d_in_lookup() to tell one from another, rather than d_unhashed(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 7月, 2016 2 次提交
-
-
由 Vivek Goyal 提交于
Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
-
由 Miklos Szeredi 提交于
Before 4bacc9c9 ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
-
- 03 7月, 2016 1 次提交
-
-
由 Vivek Goyal 提交于
overlay needs underlying fs to support d_type. Recently I put in a patch in to detect this condition and started failing mount if underlying fs did not support d_type. But this breaks existing configurations over kernel upgrade. Those who are running docker (partially broken configuration) with xfs not supporting d_type, are surprised that after kernel upgrade docker does not run anymore. https://github.com/docker/docker/issues/22937#issuecomment-229881315 So instead of erroring out, detect broken configuration and warn about it. This should allow existing docker setups to continue working after kernel upgrade. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 45aebeaf ("ovl: Ensure upper filesystem supports d_type") Cc: <stable@vger.kernel.org> 4.6
-
- 01 7月, 2016 5 次提交
-
-
由 Miklos Szeredi 提交于
(Another one for the f_path debacle.) ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask. The reason is that generic_add_lease() used filp->f_path.dentry->inode while all the others use file_inode(). This makes a difference for files opened on overlayfs since the former will point to the overlay inode the latter to the underlying inode. So generic_add_lease() added the lease to the overlay inode and generic_delete_lease() removed it from the underlying inode. When the file was released the lease remained on the overlay inode's lock list, resulting in use after free. Reported-by: NEryu Guan <eguan@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Andrey Ulanov 提交于
- m_start() in fs/namespace.c expects that ns->event is incremented each time a mount added or removed from ns->list. - umount_tree() removes items from the list but does not increment event counter, expecting that it's done before the function is called. - There are some codepaths that call umount_tree() without updating "event" counter. e.g. from __detach_mounts(). - When this happens m_start may reuse a cached mount structure that no longer belongs to ns->list (i.e. use after free which usually leads to infinite loop). This change fixes the above problem by incrementing global event counter before invoking umount_tree(). Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589 Cc: stable@vger.kernel.org Signed-off-by: NAndrey Ulanov <andreyu@google.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. In this case it's a NULL pointer dereference in p9_fid_create(). Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: NAlessio Igor Bogani <alessioigorbogani@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Tested-by: NAlessio Igor Bogani <alessioigorbogani@gmail.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Scott Mayhew 提交于
If the lockd service fails to start up then we need to be sure that the notifier blocks are not registered, otherwise a subsequent start of the service could cause the same notifier to be registered twice, leading to soft lockups. Signed-off-by: NScott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Fixes: 0751ddf7 "lockd: Register callbacks on the inetaddr_chain..." Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tahsin Erdogan 提交于
Asynchronous wb switching of inodes takes an additional ref count on an inode to make sure inode remains valid until switchover is completed. However, anyone calling ihold() must already have a ref count on inode, but in this case inode->i_count may already be zero: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30 CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-8:16) 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003 Call Trace: [<ffffffff805990af>] dump_stack+0x4d/0x6e [<ffffffff80268702>] __warn+0xc2/0xe0 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20 [<ffffffff8035b4ab>] ihold+0x2b/0x30 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0 [<ffffffff8036a147>] wb_workfn+0xd7/0x280 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0 [<ffffffff8027bb09>] process_one_work+0x129/0x300 [<ffffffff8027be06>] worker_thread+0x126/0x480 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300 [<ffffffff80280ff4>] kthread+0xc4/0xe0 [<ffffffff80335578>] ? kfree+0xc8/0x100 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70 ---[ end trace aaefd2fd9f306bc4 ]--- Signed-off-by: NTahsin Erdogan <tahsin@google.com> Acked-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 6月, 2016 2 次提交
-
-
由 Miklos Szeredi 提交于
Negotiate with userspace filesystems whether they support parallel readdir and lookup. Disable parallelism by default for fear of breaking fuse filesystems. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 9902af79 ("parallel lookups: actual switch to rwsem") Fixes: d9b3dbdc ("fuse: switch to ->iterate_shared()")
-
由 Marek Vasut 提交于
The simple_write_to_buffer() already increments the @ppos on success, see fs/libfs.c simple_write_to_buffer() comment: " On success, the number of bytes written is returned and the offset @ppos advanced by this number, or negative value is returned on error. " If the configfs_write_bin_file() is invoked with @count smaller than the total length of the written binary file, it will be invoked multiple times. Since configfs_write_bin_file() increments @ppos on success, after calling simple_write_to_buffer(), the @ppos is incremented twice. Subsequent invocation of configfs_write_bin_file() will result in the next piece of data being written to the offset twice as long as the length of the previous write, thus creating buffer with "holes" in it. The simple testcase using DTO follows: $ mkdir /sys/kernel/config/device-tree/overlays/1 $ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo Without this patch, the testcase will result in twice as big buffer in the kernel, which is then passed to the cfs_overlay_item_dtbo_write() . Signed-off-by: NMarek Vasut <marex@denx.de> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Christoph Hellwig <hch@lst.de> Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
-
- 29 6月, 2016 3 次提交
-
-
由 Miklos Szeredi 提交于
When truncating a file we should check write access on the underlying inode. And we should do so on the lower file as well (before copy-up) for consistency. Original patch and test case by Aihua Zhang. - - >o >o - - test.c - - >o >o - - #include <stdio.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int ret; ret = truncate(argv[0], 4096); if (ret != -1) { fprintf(stderr, "truncate(argv[0]) should have failed\n"); return 1; } if (errno != ETXTBSY) { perror("truncate(argv[0])"); return 1; } return 0; } - - >o >o - - >o >o - - >o >o - - Reported-by: NAihua Zhang <zhangaihua1@huawei.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
-
由 Miklos Szeredi 提交于
When using the 'default_permissions' mount option, ovl_permission() on non-directories was missing a dput(alias), resulting in "BUG Dentry still in use". Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 8d3095f4 ("ovl: default permissions") Cc: <stable@vger.kernel.org> # v4.5+
-
由 Trond Myklebust 提交于
Olga Kornievskaia reports that the following test fails to trigger an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE. fd0 = open(foo, RDRW) -- should be open on the wire for "both" fd1 = open(foo, RDONLY) -- should be open on the wire for "read" close(fd0) -- should trigger an open_downgrade read(fd1) close(fd1) The issue is that we're missing a check for whether or not the current state transitioned from an O_RDWR state as opposed to having transitioned from a combination of O_RDONLY and O_WRONLY. Reported-by: NOlga Kornievskaia <aglo@umich.edu> Fixes: cd9288ff ("NFSv4: Fix another bug in the close/open_downgrade code") Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 28 6月, 2016 1 次提交
-
-
由 Eric Sandeen 提交于
This isn't functionally apparent for some reason, but when we test io at extreme offsets at the end of the loff_t rang, such as in fstests xfs/071, the calculation of "max" in dax_io() can be wrong due to pos + size overflowing. For example, # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file enters dax_io with: start 0x7ffffffffffff000 end 0x7ffffffffffff200 and the rounded up "size" variable is 0x1000. This yields: pos + size 0x8000000000000000 (overflows loff_t) end 0x7ffffffffffff200 Due to the overflow, the min() function picks the wrong value for the "max" variable, and when we send (max - pos) into i.e. copy_from_iter_pmem() it is also the wrong value. This somehow(tm) gets magically absorbed without incident, probably because iter->count is correct. But it seems best to fix it up properly by comparing the two values as unsigned. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 27 6月, 2016 7 次提交
-
-
由 Benjamin Marzinski 提交于
When gfs2 attempts to write a page to a file that is being truncated, and notices that the page is completely outside of the file size, it tries to invalidate it. However, this may require a transaction for journaled data files to revoke any buffers from the page on the active items list. Unfortunately, this can happen inside a log flush, where a transaction cannot be started. Also, gfs2 may need to be able to remove the buffer from the ail1 list before it can finish the log flush. To deal with this, when writing a page of a file with data journalling enabled gfs2 now skips the check to see if the write is outside the file size, and simply writes it anyway. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. To do this, gfs2 now implements its own version of block_write_full_page without the check, and calls the newly exported __block_write_full_page. It also no longer calls gfs2_writepage_common from gfs2_jdata_writepage. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Benjamin Marzinski 提交于
gfs2 needs to be able to skip the check to see if a page is outside of the file size when writing it out. gfs2 can get into a situation where it needs to flush its in-memory log to disk while a truncate is in progress. If the file being trucated has data journaling enabled, it is possible that there are data blocks in the log that are past the end of the file. gfs can't finish the log flush without either writing these blocks out or revoking them. Otherwise, if the node crashed, it could overwrite subsequent changes made by other nodes in the cluster when it's journal was replayed. Unfortunately, there is no way to add log entries to the log during a flush. So gfs2 simply writes out the page instead. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. In order to make this work, gfs2 needs to be able to skip the check for writes outside the file size. Since the check exists in block_write_full_page, this patch exports __block_write_full_page, which doesn't have the check. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andreas Gruenbacher 提交于
Make the code more readable by cleaning up the different ways of initializing lock holders and checking for initialized lock holders: mark lock holders as uninitialized by setting the holder's glock to NULL (gfs2_holder_mark_uninitialized) instead of zeroing out the entire object or using a separate flag. Recognize initialized holders by their non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects which are immeditiately initialized via gfs2_holder_init or gfs2_glock_nq_init. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andreas Gruenbacher 提交于
Commit ff34245d switched from iget5_locked to iget_locked among other things, but iget_locked doesn't work for filesystems larger than 2^32 blocks on 32-bit systems. Switch back to iget5_locked. Filesystems larger than 2^32 blocks are unrealistic to work well on 32-bit systems, so this is mostly a code cleanliness fix. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andreas Gruenbacher 提交于
Now that gfs2_lookup_by_inum only takes the inode glock for new inodes (and not for cached inodes anymore), there no longer is a need to optimize the cached-inode case in gfs2_get_dentry or delete_work_func, and gfs2_ilookup can be removed. In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this inconsistency goes away as well. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Andreas Gruenbacher 提交于
The current gfs2_lookup_by_inum takes the glock of a presumed inode identified by block number, verifies that the block is indeed an inode, and then instantiates and reads the new inode via gfs2_inode_lookup. However, instantiating a new inode may block on freeing a previous instance of that inode (__wait_on_freeing_inode), and freeing an inode requires to take the glock already held, leading to lock inversion and deadlock. Fix this by first instantiating the new inode, then verifying that the block is an inode (if required), and then reading in the new inode, all in gfs2_inode_lookup. If the block we are looking for is not an inode, we discard the new inode via iget_failed, which marks inodes as bad and unhashes them. Other tasks waiting on that inode will get back a bad inode back from ilookup or iget_locked; in that case, retry the lookup. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Al Viro 提交于
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code" unconditional d_drop() after the ->open_context() had been removed. It had been correct for success cases (there ->open_context() itself had been doing dcache manipulations), but not for error ones. Only one of those (ENOENT) got a compensatory d_drop() added in that commit, but in fact it should've been done for all errors. As it is, the case of O_CREAT non-exclusive open on a hashed negative dentry racing with e.g. symlink creation from another client ended up with ->open_context() getting an error and proceeding to call nfs_lookup(). On a hashed dentry, which would've instantly triggered BUG_ON() in d_materialise_unique() (or, these days, its equivalent in d_splice_alias()). Cc: stable@vger.kernel.org # v3.10+ Tested-by: NOleg Drokin <green@linuxhacker.ru> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-