1. 14 7月, 2012 1 次提交
  2. 09 6月, 2012 1 次提交
  3. 29 2月, 2012 1 次提交
  4. 14 2月, 2012 2 次提交
  5. 24 1月, 2012 1 次提交
    • D
      mm: cleancache: s/flush/invalidate/ · 3167760f
      Dan Magenheimer 提交于
      Per akpm suggestions alter the use of the term flush to be
      invalidate. The next patch will do this across all MM.
      
      This change is completely cosmetic.
      
      [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Reviewed-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Rik Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      [v10: Fixed  fs: move code out of buffer.c conflict change]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3167760f
  6. 18 1月, 2012 1 次提交
    • K
      wake up s_wait_unfrozen when ->freeze_fs fails · e1616300
      Kazuya Mio 提交于
      dd slept infinitely when fsfeeze failed because of EIO.
      To fix this problem, if ->freeze_fs fails, freeze_super() wakes up
      the tasks waiting for the filesystem to become unfrozen.
      
      When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(),
      the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen.
      
      However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then
      freeze_super() returns an error number. In this case, FITHAW ioctl returns
      EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up
      s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely.
      Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e1616300
  7. 07 1月, 2012 3 次提交
  8. 04 1月, 2012 3 次提交
    • A
      vfs: fix the rest of sget() races · dabe0dc1
      Al Viro 提交于
      unfortunately, just checking MS_BORN after having grabbed ->s_umount in
      sget() is not enough; places that pick superblock from a list and
      grab s_umount shared need the same check in addition to checking for
      ->s_root; otherwise three-way race between failing mount, sget() and
      such list-walker can leave us with list-walker coming *second*, when
      temporary active ref grabbed by sget() (to be dropped when sget()
      notices that original mount has failed by checking MS_BORN) has
      lead to deactivate_locked_super() from failing ->mount() *not* doing
      ->kill_sb() and just releasing ->s_umount.  Once sget() gets through
      and notices that MS_BORN had never been set it will drop the active
      ref and fs will be shut down and kicked out of all lists, but it's
      too late for something like sync_supers().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dabe0dc1
    • A
      vfs: convert fs_supers to hlist · a5166169
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a5166169
    • A
      trim fs/internal.h · f47ec3f2
      Al Viro 提交于
      some stuff in there can actually become static; some belongs to pnode.h
      as it's a private interface between namespace.c and pnode.c...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f47ec3f2
  9. 02 11月, 2011 1 次提交
  10. 01 11月, 2011 1 次提交
  11. 21 7月, 2011 3 次提交
    • D
      vfs: increase shrinker batch size · 8ab47664
      Dave Chinner 提交于
      Now that the per-sb shrinker is responsible for shrinking 2 or more
      caches, increase the batch size to keep econmies of scale for
      shrinking each cache.  Increase the shrinker batch size to 1024
      objects.
      
      To allow for a large increase in batch size, add a conditional
      reschedule to prune_icache_sb() so that we don't hold the LRU spin
      lock for too long. This mirrors the behaviour of the
      __shrink_dcache_sb(), and allows us to increase the batch size
      without needing to worry about problems caused by long lock hold
      times.
      
      To ensure that filesystems using the per-sb shrinker callouts don't
      cause problems, document that the object freeing method must
      reschedule appropriately inside loops.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8ab47664
    • D
      superblock: add filesystem shrinker operations · 0e1fdafd
      Dave Chinner 提交于
      Now we have a per-superblock shrinker implementation, we can add a
      filesystem specific callout to it to allow filesystem internal
      caches to be shrunk by the superblock shrinker.
      
      Rather than perpetuate the multipurpose shrinker callback API (i.e.
      nr_to_scan == 0 meaning "tell me how many objects freeable in the
      cache), two operations will be added. The first will return the
      number of objects that are freeable, the second is the actual
      shrinker call.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0e1fdafd
    • D
      superblock: introduce per-sb cache shrinker infrastructure · b0d40c92
      Dave Chinner 提交于
      With context based shrinkers, we can implement a per-superblock
      shrinker that shrinks the caches attached to the superblock. We
      currently have global shrinkers for the inode and dentry caches that
      split up into per-superblock operations via a coarse proportioning
      method that does not batch very well.  The global shrinkers also
      have a dependency - dentries pin inodes - so we have to be very
      careful about how we register the global shrinkers so that the
      implicit call order is always correct.
      
      With a per-sb shrinker callout, we can encode this dependency
      directly into the per-sb shrinker, hence avoiding the need for
      strictly ordering shrinker registrations. We also have no need for
      any proportioning code for the shrinker subsystem already provides
      this functionality across all shrinkers. Allowing the shrinker to
      operate on a single superblock at a time means that we do less
      superblock list traversals and locking and reclaim should batch more
      effectively. This should result in less CPU overhead for reclaim and
      potentially faster reclaim of items from each filesystem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b0d40c92
  12. 20 7月, 2011 5 次提交
  13. 12 7月, 2011 1 次提交
    • J
      fixlet: Remove fs_excl from struct task. · 4aede84b
      Justin TerAvest 提交于
      fs_excl is a poor man's priority inheritance for filesystems to hint to
      the block layer that an operation is important. It was never clearly
      specified, not widely adopted, and will not prevent starvation in many
      cases (like across cgroups).
      
      fs_excl was introduced with the time sliced CFQ IO scheduler, to
      indicate when a process held FS exclusive resources and thus needed
      a boost.
      
      It doesn't cover all file systems, and it was never fully complete.
      Lets kill it.
      Signed-off-by: NJustin TerAvest <teravest@google.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4aede84b
  14. 04 6月, 2011 1 次提交
    • A
      more conservative S_NOSEC handling · 9e1f1de0
      Al Viro 提交于
      Caching "we have already removed suid/caps" was overenthusiastic as merged.
      On network filesystems we might have had suid/caps set on another client,
      silently picked by this client on revalidate, all of that *without* clearing
      the S_NOSEC flag.
      
      AFAICS, the only reasonably sane way to deal with that is
      	* new superblock flag; unless set, S_NOSEC is not going to be set.
      	* local block filesystems set it in their ->mount() (more accurately,
      mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
      local block ones clear it)
      	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
      it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
      inode attribute changes are picked from other clients.
      
      It's not an earth-shattering hole (anybody that can set suid on another client
      will almost certainly be able to write to the file before doing that anyway),
      but it's a bug that needs fixing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9e1f1de0
  15. 27 5月, 2011 1 次提交
    • D
      mm/fs: add hooks to support cleancache · c515e1fd
      Dan Magenheimer 提交于
      This fourth patch of eight in this cleancache series provides the
      core hooks in VFS for: initializing cleancache per filesystem;
      capturing clean pages reclaimed by page cache; attempting to get
      pages from cleancache before filesystem read; and ensuring coherency
      between pagecache, disk, and cleancache.  Note that the placement
      of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
      change was required due to a patchset in 2.6.39.
      
      All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
      a check of a boolean global if CONFIG_CLEANCACHE is set but no
      cleancache "backend" has claimed cleancache_ops.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      c515e1fd
  16. 19 5月, 2011 1 次提交
  17. 18 3月, 2011 1 次提交
    • A
      vfs: split off vfsmount-related parts of vfs_kern_mount() · 9d412a43
      Al Viro 提交于
      new function: mount_fs().  Does all work done by vfs_kern_mount()
      except the allocation and filling of vfsmount; returns root dentry
      or ERR_PTR().
      
      vfs_kern_mount() switched to using it and taken to fs/namespace.c,
      along with its wrappers.
      
      alloc_vfsmnt()/free_vfsmnt() made static.
      
      functions in namespace.c slightly reordered.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9d412a43
  18. 17 3月, 2011 2 次提交
  19. 12 2月, 2011 1 次提交
  20. 17 1月, 2011 1 次提交
    • A
      sanitize vfsmount refcounting changes · f03c6599
      Al Viro 提交于
      Instead of splitting refcount between (per-cpu) mnt_count
      and (SMP-only) mnt_longrefs, make all references contribute
      to mnt_count again and keep track of how many are longterm
      ones.
      
      Accounting rules for longterm count:
      	* 1 for each fs_struct.root.mnt
      	* 1 for each fs_struct.pwd.mnt
      	* 1 for having non-NULL ->mnt_ns
      	* decrement to 0 happens only under vfsmount lock exclusive
      
      That allows nice common case for mntput() - since we can't drop the
      final reference until after mnt_longterm has reached 0 due to the rules
      above, mntput() can grab vfsmount lock shared and check mnt_longterm.
      If it turns out to be non-zero (which is the common case), we know
      that this is not the final mntput() and can just blindly decrement
      percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
      do usual decrement-and-check of percpu mnt_count.
      
      For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
      namespace.c uses the latter in places where we don't already hold
      vfsmount lock exclusive and opencodes a few remaining spots where
      we need to manipulate mnt_longterm.
      
      Note that we mostly revert the code outside of fs/namespace.c back
      to what we used to have; in particular, normal code doesn't need
      to care about two kinds of references, etc.  And we get to keep
      the optimization Nick's variant had bought us...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f03c6599
  21. 07 1月, 2011 2 次提交
    • N
      fs: scale mntget/mntput · b3e19d92
      Nick Piggin 提交于
      The problem that this patch aims to fix is vfsmount refcounting scalability.
      We need to take a reference on the vfsmount for every successful path lookup,
      which often go to the same mount point.
      
      The fundamental difficulty is that a "simple" reference count can never be made
      scalable, because any time a reference is dropped, we must check whether that
      was the last reference. To do that requires communication with all other CPUs
      that may have taken a reference count.
      
      We can make refcounts more scalable in a couple of ways, involving keeping
      distributed counters, and checking for the global-zero condition less
      frequently.
      
      - check the global sum once every interval (this will delay zero detection
        for some interval, so it's probably a showstopper for vfsmounts).
      
      - keep a local count and only taking the global sum when local reaches 0 (this
        is difficult for vfsmounts, because we can't hold preempt off for the life of
        a reference, so a counter would need to be per-thread or tied strongly to a
        particular CPU which requires more locking).
      
      - keep a local difference of increments and decrements, which allows us to sum
        the total difference and hence find the refcount when summing all CPUs. Then,
        keep a single integer "long" refcount for slow and long lasting references,
        and only take the global sum of local counters when the long refcount is 0.
      
      This last scheme is what I implemented here. Attached mounts and process root
      and working directory references are "long" references, and everything else is
      a short reference.
      
      This allows scalable vfsmount references during path walking over mounted
      subtrees and unattached (lazy umounted) mounts with processes still running
      in them.
      
      This results in one fewer atomic op in the fastpath: mntget is now just a
      per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
      and non-atomic decrement in the common case. However code is otherwise bigger
      and heavier, so single threaded performance is basically a wash.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b3e19d92
    • N
      fs: dcache per-bucket dcache hash locking · ceb5bdc2
      Nick Piggin 提交于
      We can turn the dcache hash locking from a global dcache_hash_lock into
      per-bucket locking.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      ceb5bdc2
  22. 13 11月, 2010 2 次提交
    • T
      block: clean up blkdev_get() wrappers and their users · d4d77629
      Tejun Heo 提交于
      After recent blkdev_get() modifications, open_by_devnum() and
      open_bdev_exclusive() are simple wrappers around blkdev_get().
      Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
      
      blkdev_get_by_dev() is identical to open_by_devnum().
      blkdev_get_by_path() is slightly different in that it doesn't
      automatically add %FMODE_EXCL to @mode.
      
      All users are converted.  Most conversions are mechanical and don't
      introduce any behavior difference.  There are several exceptions.
      
      * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
        reason to OR it explicitly on blkdev_put().
      
      * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
        sb->s_mode.
      
      * With the above changes, sb->s_mode now always should contain
        FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
        errors.
      
      The new blkdev_get_*() functions are with proper docbook comments.
      While at it, add function description to blkdev_get() too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Joern Engel <joern@lazybastard.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: xfs-masters@oss.sgi.com
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      d4d77629
    • T
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo 提交于
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  23. 29 10月, 2010 4 次提交