- 27 7月, 2016 14 次提交
-
-
由 Kirill A. Shutemov 提交于
Let's add ShmemHugePages and ShmemPmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map shmem THP. NR_ANON_TRANSPARENT_HUGEPAGES is renamed to NR_ANON_THPS. Link: http://lkml.kernel.org/r/1466021202-61880-27-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
The idea borrowed from Peter's patch from patchset on speculative page faults[1]: Instead of passing around the endless list of function arguments, replace the lot with a single structure so we can change context without endless function signature changes. The changes are mostly mechanical with exception of faultaround code: filemap_map_pages() got reworked a bit. This patch is preparation for the next one. [1] http://lkml.kernel.org/r/20141020222841.302891540@infradead.org Link: http://lkml.kernel.org/r/1466021202-61880-9-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Vladimir has noticed that we might declare memcg oom even during readahead because read_pages only uses GFP_KERNEL (with mapping_gfp restriction) while __do_page_cache_readahead uses page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from OOMs. This gfp mask discrepancy is really unfortunate and easily fixable. Drop page_cache_alloc_readahead() which only has one user and outsource the gfp_mask logic into readahead_gfp_mask and propagate this mask from __do_page_cache_readahead down to read_pages. This alone would have only very limited impact as most filesystems are implementing ->readpages and the common implementation mpage_readpages does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to use readahead_gfp_mask instead as this function is called only during readahead as well. The same applies to read_cache_pages. ext4 has its own ext4_mpage_readpages but the path which has pages != NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are doing a very similar pattern to mpage_readpages so the same can be applied to them as well. [akpm@linux-foundation.org: coding-style fixes] [mhocko@suse.com: restrict gfp mask in mpage_alloc] Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Chris Mason <clm@fb.com> Cc: Steve French <sfrench@samba.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Changman Lee <cm224.lee@samsung.com> Cc: Chao Yu <yuchao0@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
Pipes can consume a significant amount of system memory, hence they should be accounted to kmemcg. This patch marks pipe_inode_info and anonymous pipe buffer page allocations as __GFP_ACCOUNT so that they would be charged to kmemcg. Note, since a pipe buffer page can be "stolen" and get reused for other purposes, including mapping to userspace, we clear PageKmemcg thus resetting page->_mapcount and uncharge it in anon_pipe_buf_steal, which is introduced by this patch. A note regarding anon_pipe_buf_steal implementation. We allow to steal the page if its ref count equals 1. It looks racy, but it is correct for anonymous pipe buffer pages, because: - We lock out all other pipe users, because ->steal is called with pipe_lock held, so the page can't be spliced to another pipe from under us. - The page is not on LRU and it never was. - Thus a parallel thread can access it only by PFN. Although this is quite possible (e.g. see page_idle_get_page and balloon_page_isolate) this is not dangerous, because all such functions do is increase page ref count, check if the page is the one they are looking for, and decrease ref count if it isn't. Since our page is clean except for PageKmemcg mark, which doesn't conflict with other _mapcount users, the worst that can happen is we see page_count > 2 due to a transient ref, in which case we false-positively abort ->steal, which is still fine, because ->steal is not guaranteed to succeed. Link: http://lkml.kernel.org/r/20160527150313.GD26059@esperanzaSigned-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Brian Foster 提交于
The per-sb inode writeback list tracks inodes currently under writeback to facilitate efficient sync processing. In particular, it ensures that sync only needs to walk through a list of inodes that were cleaned by the sync. Add a couple tracepoints to help identify when inodes are added/removed to and from the writeback lists. Piggyback off of the writeback lazytime tracepoint template as it already tracks the relevant inode information. Link: http://lkml.kernel.org/r/1466594593-6757-3-git-send-email-bfoster@redhat.comSigned-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> cc: Josef Bacik <jbacik@fb.com> Cc: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Chinner 提交于
wait_sb_inodes() currently does a walk of all inodes in the filesystem to find dirty one to wait on during sync. This is highly inefficient and wastes a lot of CPU when there are lots of clean cached inodes that we don't need to wait on. To avoid this "all inode" walk, we need to track inodes that are currently under writeback that we need to wait for. We do this by adding inodes to a writeback list on the sb when the mapping is first tagged as having pages under writeback. wait_sb_inodes() can then walk this list of "inodes under IO" and wait specifically just for the inodes that the current sync(2) needs to wait for. Define a couple helpers to add/remove an inode from the writeback list and call them when the overall mapping is tagged for or cleared from writeback. Update wait_sb_inodes() to walk only the inodes under writeback due to the sync. With this change, filesystem sync times are significantly reduced for fs' with largely populated inode caches and otherwise no other work to do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem with a ~10m entry inode cache, sync times are reduced from ~7.3s to less than 0.1s when the filesystem is fully clean. Link: http://lkml.kernel.org/r/1466594593-6757-2-git-send-email-bfoster@redhat.comSigned-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Tested-by: NHolger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
Clean up unnecessary assignment for 'ret'. Link: http://lkml.kernel.org/r/578C61F6.4080403@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
These BUG_ON(!inode) are obscure because we have already used inode to get osb. And actually we can guarantee here inode is valid in the context. So we can safely remove them. Link: http://lkml.kernel.org/r/5776336A.6030104@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NEric Ren <zren@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Several prototypes in inode.h are just defined but not actually implemented and used, so remove them. Link: http://lkml.kernel.org/r/57763787.4020706@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
dlm_debug_ctxt->debug_refcnt is initialized to 1 and then increased to 2 by dlm_debug_get in dlm_debug_init. But dlm_debug_put is called only once in dlm_debug_shutdown during unregister dlm, which leads to dlm_debug_ctxt leaked. Link: http://lkml.kernel.org/r/577BB755.4030900@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
The last goto is unneeded, so remove it. Link: http://lkml.kernel.org/r/576213D3.6080002@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
Journal replay will be run when performing recovery for a dead node. To avoid the stale cache impact, all blocks of dead node's journal inode were reloaded from disk. This hurts the performance. Check whether one block is cached before reloading it can improve performance a lot. In my test env, the time doing recovery was improved from 120s to 1s. [akpm@linux-foundation.org: clean up the for loop p_blkno handling] Link: http://lkml.kernel.org/r/1466155682-24656-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: "Gang He" <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Ren 提交于
Obviously, memset() has zeroed the whole struct locking_max_version. So, it's no need to zero its two fields individually. Link: http://lkml.kernel.org/r/1463970605-18354-1-git-send-email-zren@suse.comSigned-off-by: NEric Ren <zren@suse.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NGang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ross Zwisler 提交于
Remove the unused wrappers dax_fault() and dax_pmd_fault(). After this removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and dax_pmd_fault() respectively, and update all callers. The dax_fault() and dax_pmd_fault() wrappers were initially intended to capture some filesystem independent functionality around page faults (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime and ctime). However, the following commits: 5726b27b ("ext2: Add locking for DAX faults") ea3d7209 ("ext4: fix races between page faults and hole punching") added locking to the ext2 and ext4 filesystems after these common operations but before __dax_fault() and __dax_pmd_fault() were called. This means that these wrappers are no longer used, and are unlikely to be used in the future. XFS has had locking analogous to what was recently added to ext2 and ext4 since DAX support was initially introduced by: 6b698ede ("xfs: add DAX file operations support") Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 7月, 2016 2 次提交
-
-
由 Maxim Patlasov 提交于
The upper dentry may become stale before we call ovl_lock_rename_workdir. For example, someone could (mistakenly or maliciously) manually unlink(2) it directly from upperdir. To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir and and check if it matches the upper dentry. Essentially, it is the same problem and similar solution as in commit 11f37104 ("ovl: verify upper dentry before unlink and rename"). Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
-
由 Bob Peterson 提交于
Before this patch, if you used gfs2_jadd to add new journals of a size smaller than the existing journals, replaying those new journals would withdraw. That's because function gfs2_replay_incr_blk was using the number of journal blocks (jd_block) from the superblock's journal pointer. In other words, "My journal's max size" rather than "the journal we're replaying's size." This patch changes the function to use the size of the pertinent journal rather than always using the journal we happen to be using. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 16 7月, 2016 1 次提交
-
-
由 Jann Horn 提交于
Without this check, the following XFS_I invocations would return bad pointers when used on non-XFS inodes (perhaps pointers into preceding allocator chunks). This could be used by an attacker to trick xfs_swap_extents into performing locking operations on attacker-chosen structures in kernel memory, potentially leading to code execution in the kernel. (I have not investigated how likely this is to be usable for an attack in practice.) Signed-off-by: NJann Horn <jann@thejh.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2016 1 次提交
-
-
由 H.J. Lu 提交于
Don't use the same syscall numbers for 2 different syscalls: 534 x32 preadv compat_sys_preadv64 535 x32 pwritev compat_sys_pwritev64 534 x32 preadv2 compat_sys_preadv2 535 x32 pwritev2 compat_sys_pwritev2 Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset is passed in one 64-bit register on x32, similar to compat_sys_preadv64() and compat_sys_pwritev64(). Signed-off-by: NH.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 7月, 2016 1 次提交
-
-
由 Fengguang Wu 提交于
To fix super long dmesg error lines like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation rangeCHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation rangeswapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) After fix, it should look like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation range CHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation range swapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) Reported-by: NPhilip Li <philip.li@intel.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 7月, 2016 1 次提交
-
-
由 Bob Peterson 提交于
For the last process to close a file opened for write, function gfs2_rsqa_delete was deleting the file's inode's block reservation out of the rgrp reservations tree. Then it was checking to make sure rs_free was 0, but it was performing the check outside the protection of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed to prevent a race between the process freeing the reservation and another who is allocating a new set of blocks inside the same rgrp for the same inode, thus changing its value. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 08 7月, 2016 2 次提交
-
-
由 Jeff Mahoney 提交于
There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: NJeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
-
由 Jeff Mahoney 提交于
This reverts commit 2f36db71. It fixed a local root exploit but also introduced a dependency on the lower file system implementing an mmap operation just to open a file, which is a bit of a heavy hammer. The right fix is to have mmap depend on the existence of the mmap handler instead. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
-
- 06 7月, 2016 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
->atomic_open() can be given an in-lookup dentry *or* a negative one found in dcache. Use d_in_lookup() to tell one from another, rather than d_unhashed(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 7月, 2016 2 次提交
-
-
由 Vivek Goyal 提交于
Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
-
由 Miklos Szeredi 提交于
Before 4bacc9c9 ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
-
- 03 7月, 2016 1 次提交
-
-
由 Vivek Goyal 提交于
overlay needs underlying fs to support d_type. Recently I put in a patch in to detect this condition and started failing mount if underlying fs did not support d_type. But this breaks existing configurations over kernel upgrade. Those who are running docker (partially broken configuration) with xfs not supporting d_type, are surprised that after kernel upgrade docker does not run anymore. https://github.com/docker/docker/issues/22937#issuecomment-229881315 So instead of erroring out, detect broken configuration and warn about it. This should allow existing docker setups to continue working after kernel upgrade. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 45aebeaf ("ovl: Ensure upper filesystem supports d_type") Cc: <stable@vger.kernel.org> 4.6
-
- 01 7月, 2016 5 次提交
-
-
由 Miklos Szeredi 提交于
(Another one for the f_path debacle.) ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask. The reason is that generic_add_lease() used filp->f_path.dentry->inode while all the others use file_inode(). This makes a difference for files opened on overlayfs since the former will point to the overlay inode the latter to the underlying inode. So generic_add_lease() added the lease to the overlay inode and generic_delete_lease() removed it from the underlying inode. When the file was released the lease remained on the overlay inode's lock list, resulting in use after free. Reported-by: NEryu Guan <eguan@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Andrey Ulanov 提交于
- m_start() in fs/namespace.c expects that ns->event is incremented each time a mount added or removed from ns->list. - umount_tree() removes items from the list but does not increment event counter, expecting that it's done before the function is called. - There are some codepaths that call umount_tree() without updating "event" counter. e.g. from __detach_mounts(). - When this happens m_start may reuse a cached mount structure that no longer belongs to ns->list (i.e. use after free which usually leads to infinite loop). This change fixes the above problem by incrementing global event counter before invoking umount_tree(). Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589 Cc: stable@vger.kernel.org Signed-off-by: NAndrey Ulanov <andreyu@google.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. In this case it's a NULL pointer dereference in p9_fid_create(). Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: NAlessio Igor Bogani <alessioigorbogani@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Tested-by: NAlessio Igor Bogani <alessioigorbogani@gmail.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Scott Mayhew 提交于
If the lockd service fails to start up then we need to be sure that the notifier blocks are not registered, otherwise a subsequent start of the service could cause the same notifier to be registered twice, leading to soft lockups. Signed-off-by: NScott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Fixes: 0751ddf7 "lockd: Register callbacks on the inetaddr_chain..." Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tahsin Erdogan 提交于
Asynchronous wb switching of inodes takes an additional ref count on an inode to make sure inode remains valid until switchover is completed. However, anyone calling ihold() must already have a ref count on inode, but in this case inode->i_count may already be zero: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30 CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-8:16) 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003 Call Trace: [<ffffffff805990af>] dump_stack+0x4d/0x6e [<ffffffff80268702>] __warn+0xc2/0xe0 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20 [<ffffffff8035b4ab>] ihold+0x2b/0x30 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0 [<ffffffff8036a147>] wb_workfn+0xd7/0x280 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0 [<ffffffff8027bb09>] process_one_work+0x129/0x300 [<ffffffff8027be06>] worker_thread+0x126/0x480 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300 [<ffffffff80280ff4>] kthread+0xc4/0xe0 [<ffffffff80335578>] ? kfree+0xc8/0x100 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70 ---[ end trace aaefd2fd9f306bc4 ]--- Signed-off-by: NTahsin Erdogan <tahsin@google.com> Acked-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 6月, 2016 2 次提交
-
-
由 Miklos Szeredi 提交于
Negotiate with userspace filesystems whether they support parallel readdir and lookup. Disable parallelism by default for fear of breaking fuse filesystems. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 9902af79 ("parallel lookups: actual switch to rwsem") Fixes: d9b3dbdc ("fuse: switch to ->iterate_shared()")
-
由 Marek Vasut 提交于
The simple_write_to_buffer() already increments the @ppos on success, see fs/libfs.c simple_write_to_buffer() comment: " On success, the number of bytes written is returned and the offset @ppos advanced by this number, or negative value is returned on error. " If the configfs_write_bin_file() is invoked with @count smaller than the total length of the written binary file, it will be invoked multiple times. Since configfs_write_bin_file() increments @ppos on success, after calling simple_write_to_buffer(), the @ppos is incremented twice. Subsequent invocation of configfs_write_bin_file() will result in the next piece of data being written to the offset twice as long as the length of the previous write, thus creating buffer with "holes" in it. The simple testcase using DTO follows: $ mkdir /sys/kernel/config/device-tree/overlays/1 $ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo Without this patch, the testcase will result in twice as big buffer in the kernel, which is then passed to the cfs_overlay_item_dtbo_write() . Signed-off-by: NMarek Vasut <marex@denx.de> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Christoph Hellwig <hch@lst.de> Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
-
- 29 6月, 2016 3 次提交
-
-
由 Miklos Szeredi 提交于
When truncating a file we should check write access on the underlying inode. And we should do so on the lower file as well (before copy-up) for consistency. Original patch and test case by Aihua Zhang. - - >o >o - - test.c - - >o >o - - #include <stdio.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int ret; ret = truncate(argv[0], 4096); if (ret != -1) { fprintf(stderr, "truncate(argv[0]) should have failed\n"); return 1; } if (errno != ETXTBSY) { perror("truncate(argv[0])"); return 1; } return 0; } - - >o >o - - >o >o - - >o >o - - Reported-by: NAihua Zhang <zhangaihua1@huawei.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
-
由 Miklos Szeredi 提交于
When using the 'default_permissions' mount option, ovl_permission() on non-directories was missing a dput(alias), resulting in "BUG Dentry still in use". Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 8d3095f4 ("ovl: default permissions") Cc: <stable@vger.kernel.org> # v4.5+
-
由 Trond Myklebust 提交于
Olga Kornievskaia reports that the following test fails to trigger an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE. fd0 = open(foo, RDRW) -- should be open on the wire for "both" fd1 = open(foo, RDONLY) -- should be open on the wire for "read" close(fd0) -- should trigger an open_downgrade read(fd1) close(fd1) The issue is that we're missing a check for whether or not the current state transitioned from an O_RDWR state as opposed to having transitioned from a combination of O_RDONLY and O_WRONLY. Reported-by: NOlga Kornievskaia <aglo@umich.edu> Fixes: cd9288ff ("NFSv4: Fix another bug in the close/open_downgrade code") Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 28 6月, 2016 1 次提交
-
-
由 Eric Sandeen 提交于
This isn't functionally apparent for some reason, but when we test io at extreme offsets at the end of the loff_t rang, such as in fstests xfs/071, the calculation of "max" in dax_io() can be wrong due to pos + size overflowing. For example, # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file enters dax_io with: start 0x7ffffffffffff000 end 0x7ffffffffffff200 and the rounded up "size" variable is 0x1000. This yields: pos + size 0x8000000000000000 (overflows loff_t) end 0x7ffffffffffff200 Due to the overflow, the min() function picks the wrong value for the "max" variable, and when we send (max - pos) into i.e. copy_from_iter_pmem() it is also the wrong value. This somehow(tm) gets magically absorbed without incident, probably because iter->count is correct. But it seems best to fix it up properly by comparing the two values as unsigned. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 27 6月, 2016 2 次提交
-
-
由 Benjamin Marzinski 提交于
When gfs2 attempts to write a page to a file that is being truncated, and notices that the page is completely outside of the file size, it tries to invalidate it. However, this may require a transaction for journaled data files to revoke any buffers from the page on the active items list. Unfortunately, this can happen inside a log flush, where a transaction cannot be started. Also, gfs2 may need to be able to remove the buffer from the ail1 list before it can finish the log flush. To deal with this, when writing a page of a file with data journalling enabled gfs2 now skips the check to see if the write is outside the file size, and simply writes it anyway. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. To do this, gfs2 now implements its own version of block_write_full_page without the check, and calls the newly exported __block_write_full_page. It also no longer calls gfs2_writepage_common from gfs2_jdata_writepage. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Benjamin Marzinski 提交于
gfs2 needs to be able to skip the check to see if a page is outside of the file size when writing it out. gfs2 can get into a situation where it needs to flush its in-memory log to disk while a truncate is in progress. If the file being trucated has data journaling enabled, it is possible that there are data blocks in the log that are past the end of the file. gfs can't finish the log flush without either writing these blocks out or revoking them. Otherwise, if the node crashed, it could overwrite subsequent changes made by other nodes in the cluster when it's journal was replayed. Unfortunately, there is no way to add log entries to the log during a flush. So gfs2 simply writes out the page instead. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. In order to make this work, gfs2 needs to be able to skip the check for writes outside the file size. Since the check exists in block_write_full_page, this patch exports __block_write_full_page, which doesn't have the check. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-