1. 08 4月, 2014 1 次提交
  2. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  3. 26 1月, 2014 2 次提交
  4. 10 1月, 2014 1 次提交
  5. 24 11月, 2013 8 次提交
  6. 16 11月, 2013 1 次提交
  7. 25 10月, 2013 1 次提交
  8. 28 9月, 2013 1 次提交
    • D
      FS-Cache: Provide the ability to enable/disable cookies · 94d30ae9
      David Howells 提交于
      Provide the ability to enable and disable fscache cookies.  A disabled cookie
      will reject or ignore further requests to:
      
      	Acquire a child cookie
      	Invalidate and update backing objects
      	Check the consistency of a backing object
      	Allocate storage for backing page
      	Read backing pages
      	Write to backing pages
      
      but still allows:
      
      	Checks/waits on the completion of already in-progress objects
      	Uncaching of pages
      	Relinquishment of cookies
      
      Two new operations are provided:
      
       (1) Disable a cookie:
      
      	void fscache_disable_cookie(struct fscache_cookie *cookie,
      				    bool invalidate);
      
           If the cookie is not already disabled, this locks the cookie against other
           dis/enablement ops, marks the cookie as being disabled, discards or
           invalidates any backing objects and waits for cessation of activity on any
           associated object.
      
           This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
           but it reinitialises the cookie such that it can be reenabled.
      
           All possible failures are handled internally.  The caller should consider
           calling fscache_uncache_all_inode_pages() afterwards to make sure all page
           markings are cleared up.
      
       (2) Enable a cookie:
      
      	void fscache_enable_cookie(struct fscache_cookie *cookie,
      				   bool (*can_enable)(void *data),
      				   void *data)
      
           If the cookie is not already enabled, this locks the cookie against other
           dis/enablement ops, invokes can_enable() and, if the cookie is not an
           index cookie, will begin the procedure of acquiring backing objects.
      
           The optional can_enable() function is passed the data argument and returns
           a ruling as to whether or not enablement should actually be permitted to
           begin.
      
           All possible failures are handled internally.  The cookie will only be
           marked as enabled if provisional backing objects are allocated.
      
      A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
      is then contingent on i_writecount <= 0.  can_enable() checks for a race
      between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
      handling and allows us to get rid of open(O_RDONLY) accidentally introducing
      caching to an inode that's open for writing already.
      
      One operation has its API modified:
      
       (3) Acquire a cookie.
      
      	struct fscache_cookie *fscache_acquire_cookie(
      		struct fscache_cookie *parent,
      		const struct fscache_cookie_def *def,
      		void *netfs_data,
      		bool enable);
      
           This now has an additional argument that indicates whether the requested
           cookie should be enabled by default.  It doesn't need the can_enable()
           function because the caller must prevent multiple calls for the same netfs
           object and it doesn't need to take the enablement lock because no one else
           can get at the cookie before this returns.
      
      Signed-off-by: David Howells <dhowells@redhat.com
      94d30ae9
  9. 18 9月, 2013 2 次提交
  10. 26 8月, 2013 1 次提交
    • W
      fs/9p: avoid accessing utsname after namespace has been torn down · 50192abe
      Will Deacon 提交于
      During trinity fuzzing in a kvmtool guest, I stumbled across the
      following:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000004
      PC is at v9fs_file_do_lock+0xc8/0x1a0
      LR is at v9fs_file_do_lock+0x48/0x1a0
      [<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124)
      [<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4)
      [<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8)
      [<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8)
      [<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0)
      [<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18)
      
      I believe this is due to an attempt to access utsname()->nodename, after
      exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns
      as NULL and causing the above dereference.
      
      A similar issue was fixed for lockd in 9a1b6bf8 ("LOCKD: Don't call
      utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts
      something similar for 9pfs.
      
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      50192abe
  11. 30 7月, 2013 1 次提交
  12. 08 7月, 2013 2 次提交
  13. 29 6月, 2013 1 次提交
  14. 22 5月, 2013 1 次提交
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  15. 08 5月, 2013 1 次提交
  16. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  17. 28 2月, 2013 8 次提交
  18. 26 2月, 2013 6 次提交