1. 31 3月, 2014 4 次提交
    • J
      locks: pass the cmd value to fcntl_getlk/getlk64 · c1e62b8f
      Jeff Layton 提交于
      Once we introduce file private locks, we'll need to know what cmd value
      was used, as that affects the ownership and whether a conflict would
      arise.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      c1e62b8f
    • J
      locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" · c918d42a
      Jeff Layton 提交于
      In a later patch, we'll be adding a new type of lock that's owned by
      the struct file instead of the files_struct. Those sorts of locks
      will be flagged with a new FL_FILE_PVT flag.
      
      Report these types of locks as "FLPVT" in /proc/locks to distinguish
      them from "classic" POSIX locks.
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      c918d42a
    • J
      locks: rename locks_remove_flock to locks_remove_file · 78ed8a13
      Jeff Layton 提交于
      This function currently removes leases in addition to flock locks and in
      a later patch we'll have it deal with file-private locks too. Rename it
      to locks_remove_file to indicate that it removes locks that are
      associated with a particular struct file, and not just flock locks.
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      78ed8a13
    • J
      locks: close potential race between setlease and open · 24cbe784
      Jeff Layton 提交于
      As Al Viro points out, there is an unlikely, but possible race between
      opening a file and setting a lease on it. generic_add_lease is done with
      the i_lock held, but the inode->i_flock check in break_lease is
      lockless. It's possible for another task doing an open to do the entire
      pathwalk and call break_lease between the point where generic_add_lease
      checks for a conflicting open and adds the lease to the list. If this
      occurs, we can end up with a lease set on the file with a conflicting
      open.
      
      To guard against that, check again for a conflicting open after adding
      the lease to the i_flock list. If the above race occurs, then we can
      simply unwind the lease setting and return -EAGAIN.
      
      Because we take dentry references and acquire write access on the file
      before calling break_lease, we know that if the i_flock list is empty
      when the open caller goes to check it then the necessary refcounts have
      already been incremented. Thus the additional check for a conflicting
      open will see that there is one and the setlease call will fail.
      
      Cc: Bruce Fields <bfields@fieldses.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@fieldses.org>
      24cbe784
  2. 26 1月, 2014 1 次提交
  3. 16 11月, 2013 1 次提交
  4. 09 11月, 2013 12 次提交
  5. 25 10月, 2013 3 次提交
  6. 28 9月, 2013 1 次提交
    • D
      NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() · f1fe29b4
      David Howells 提交于
      Use i_writecount to control whether to get an fscache cookie in nfs_open() as
      NFS does not do write caching yet.  I *think* this is the cause of a problem
      encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
      pointer dereference because cookie->def is NULL:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      PGD 0
      Thread overran stack, or stack corrupted
      Oops: 0000 [#1] SMP
      Modules linked in: ...
      CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
      Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
      task: ffff8804203460c0 ti: ffff880420346640
      RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      RSP: 0018:ffff8801053af878 EFLAGS: 00210286
      RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
      RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
      RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
      R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
      FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
      CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
      Stack:
      ...
      Call Trace:
      [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
      [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
      [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
      [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
      [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
      [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
      [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
      [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
      [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
      [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
      [<ffffffff81357f78>] nfs_setattr+0xc8/0x150
      [<ffffffff8122d95b>] notify_change+0x1cb/0x390
      [<ffffffff8120a55b>] do_truncate+0x7b/0xc0
      [<ffffffff8121f96c>] do_last+0xa4c/0xfd0
      [<ffffffff8121ffbc>] path_openat+0xcc/0x670
      [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
      [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
      [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
      [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24
      
      The code at the instruction pointer was disassembled:
      
      > (gdb) disas __fscache_uncache_page
      > Dump of assembler code for function __fscache_uncache_page:
      > ...
      > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
      > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
      > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>
      
      These instructions make up:
      
      	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
      
      That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
      which presumably means that the cookie has already been at least partway
      through __fscache_relinquish_cookie().
      
      What I think may be happening is something like a three-way race on the same
      file:
      
      	PROCESS 1	PROCESS 2	PROCESS 3
      	===============	===============	===============
      	open(O_TRUNC|O_WRONLY)
      			open(O_RDONLY)
      					open(O_WRONLY)
      	-->nfs_open()
      	-->nfs_fscache_set_inode_cookie()
      	nfs_fscache_inode_lock()
      	nfs_fscache_disable_inode_cookie()
      	__fscache_relinquish_cookie()
      	nfs_inode->fscache = NULL
      	<--nfs_fscache_set_inode_cookie()
      
      			-->nfs_open()
      			-->nfs_fscache_set_inode_cookie()
      			nfs_fscache_inode_lock()
      			nfs_fscache_enable_inode_cookie()
      			__fscache_acquire_cookie()
      			nfs_inode->fscache = cookie
      			<--nfs_fscache_set_inode_cookie()
      	<--nfs_open()
      	-->nfs_setattr()
      	...
      	...
      	-->nfs_invalidate_page()
      	-->__nfs_fscache_invalidate_page()
      	cookie = nfsi->fscache
      					-->nfs_open()
      					-->nfs_fscache_set_inode_cookie()
      					nfs_fscache_inode_lock()
      					nfs_fscache_disable_inode_cookie()
      					-->__fscache_relinquish_cookie()
      	-->__fscache_uncache_page(cookie)
      	<crash>
      					<--__fscache_relinquish_cookie()
      					nfs_inode->fscache = NULL
      					<--nfs_fscache_set_inode_cookie()
      
      What is needed is something to prevent process #2 from reacquiring the cookie
      - and I think checking i_writecount should do the trick.
      
      It's also possible to have a two-way race on this if the file is opened
      O_TRUNC|O_RDONLY instead.
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f1fe29b4
  7. 11 9月, 2013 8 次提交
    • D
      fs: convert inode and dentry shrinking to be node aware · 9b17c623
      Dave Chinner 提交于
      Now that the shrinker is passing a node in the scan control structure, we
      can pass this to the the generic LRU list code to isolate reclaim to the
      lists on matching nodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b17c623
    • D
      dcache: convert to use new lru list infrastructure · f6041567
      Dave Chinner 提交于
      [glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f6041567
    • D
      inode: convert inode lru list to generic lru list code. · bc3b14cb
      Dave Chinner 提交于
      [glommer@openvz.org: adapted for new LRU return codes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bc3b14cb
    • D
      shrinker: convert superblock shrinkers to new API · 0a234c6d
      Dave Chinner 提交于
      Convert superblock shrinker to use the new count/scan API, and propagate
      the API changes through to the filesystem callouts.  The filesystem
      callouts already use a count/scan API, so it's just changing counters to
      longs to match the VM API.
      
      This requires the dentry and inode shrinker callouts to be converted to
      the count/scan API.  This is mainly a mechanical change.
      
      [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a234c6d
    • D
      dentry: move to per-sb LRU locks · 19156840
      Dave Chinner 提交于
      With the dentry LRUs being per-sb structures, there is no real need for
      a global dentry_lru_lock. The locking can be made more fine-grained by
      moving to a per-sb LRU lock, isolating the LRU operations of different
      filesytsems completely from each other. The need for this is independent
      of any performance consideration that may arise: in the interest of
      abstracting the lru operations away, it is mandatory that each lru works
      around its own lock instead of a global lock for all of them.
      
      [glommer@openvz.org: updated changelog ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19156840
    • G
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa 提交于
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
      
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
      
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      change.
      
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      
      This patch:
      
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3942c07c
    • C
      fs: remove vfs_follow_link · aac34df1
      Christoph Hellwig 提交于
      For a long time no filesystem has been using vfs_follow_link, and as seen
      by recent filesystem submissions any new use is accidental as well.
      
      Remove vfs_follow_link, document the replacement in
      Documentation/filesystems/porting and also rename __vfs_follow_link
      to match its only caller better.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aac34df1
    • C
      fs: remove vfs_follow_link · 4aa32895
      Christoph Hellwig 提交于
      For a long time no filesystem has been using vfs_follow_link, and as seen
      by recent filesystem submissions any new use is accidental as well.
      
      Remove vfs_follow_link, document the replacement in
      Documentation/filesystems/porting and also rename __vfs_follow_link
      to match its only caller better.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4aa32895
  8. 04 9月, 2013 2 次提交
    • C
      direct-io: Implement generic deferred AIO completions · 7b7a8665
      Christoph Hellwig 提交于
      Add support to the core direct-io code to defer AIO completions to user
      context using a workqueue.  This replaces opencoded and less efficient
      code in XFS and ext4 (we save a memory allocation for each direct IO)
      and will be needed to properly support O_(D)SYNC for AIO.
      
      The communication between the filesystem and the direct I/O code requires
      a new buffer head flag, which is a bit ugly but not avoidable until the
      direct I/O code stops abusing the buffer_head structure for communicating
      with the filesystems.
      
      Currently this creates a per-superblock unbound workqueue for these
      completions, which is taken from an earlier patch by Jan Kara.  I'm
      not really convinced about this use and would prefer a "normal" global
      workqueue with a high concurrency limit, but this needs further discussion.
      
      JK: Fixed ext4 part, dynamic allocation of the workqueue.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7b7a8665
    • A
      constify touch_atime() · badcf2b7
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      badcf2b7
  9. 27 8月, 2013 1 次提交
    • E
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman 提交于
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
  10. 17 8月, 2013 1 次提交
  11. 13 7月, 2013 1 次提交
  12. 10 7月, 2013 1 次提交
  13. 09 7月, 2013 1 次提交
    • J
      writeback: Do not sort b_io list only because of block device inode · a8855990
      Jan Kara 提交于
      It is very likely that block device inode will be part of BDI dirty list
      as well. However it doesn't make sence to sort inodes on the b_io list
      just because of this inode (as it contains buffers all over the device
      anyway). So save some CPU cycles which is valuable since we hold relatively
      contented wb->list_lock.
      Signed-off-by: NJan Kara <jack@suse.cz>
      a8855990
  14. 08 7月, 2013 1 次提交
  15. 04 7月, 2013 1 次提交
    • M
      mm: vmscan: take page buffers dirty and locked state into account · b4597226
      Mel Gorman 提交于
      Page reclaim keeps track of dirty and under writeback pages and uses it
      to determine if wait_iff_congested() should stall or if kswapd should
      begin writing back pages.  This fails to account for buffer pages that
      can be under writeback but not PageWriteback which is the case for
      filesystems like ext3 ordered mode.  Furthermore, PageDirty buffer pages
      can have all the buffers clean and writepage does no IO so it should not
      be accounted as congested.
      
      This patch adds an address_space operation that filesystems may
      optionally use to check if a page is really dirty or really under
      writeback.  An implementation is provided for for buffer_heads is added
      and used for block operations and ext3 in ordered mode.  By default the
      page flags are obeyed.
      
      Credit goes to Jan Kara for identifying that the page flags alone are
      not sufficient for ext3 and sanity checking a number of ideas on how the
      problem could be addressed.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Zlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4597226
  16. 03 7月, 2013 1 次提交
    • J
      vfs: export lseek_execute() to modules · 46a1c2c7
      Jie Liu 提交于
      For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
      SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
      matter in lseek_execute() to update the current file offset
      to the desired offset if it is valid, ceph also does the
      simliar things at ceph_llseek().
      
      To reduce the duplications, this patch make lseek_execute()
      public accessible so that we can call it directly from the
      underlying file systems.
      
      Thanks Dave Chinner for this suggestion.
      
      [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]
      
      v2->v1:
      - Add kernel-doc comments for lseek_execute()
      - Call lseek_execute() in ceph->llseek()
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      46a1c2c7