1. 12 6月, 2009 14 次提交
    • A
      Push lock_super() into the ->remount_fs() of filesystems that care about it · bbd6851a
      Al Viro 提交于
      Note that since we can't run into contention between remount_fs and write_super
      (due to exclusion on s_umount), we have to care only about filesystems that
      touch lock_super() on their own.  Out of those ext3, ext4, hpfs, sysv and ufs
      do need it; fat doesn't since its ->remount_fs() only accesses assign-once
      data (basically, it's "we have no atime on directories and only have atime on
      files for vfat; force nodiratime and possibly noatime into *flags").
      
      [folded a build fix from hch]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bbd6851a
    • C
      push BKL down into ->put_super · 6cfd0148
      Christoph Hellwig 提交于
      Move BKL into ->put_super from the only caller.  A couple of
      filesystems had trivial enough ->put_super (only kfree and NULLing of
      s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
      hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
      of them probably don't need it, but I'd rather sort that out individually.
      Preferably after all the other BKL pushdowns in that area.
      
      [AV: original used to move lock_super() down as well; these changes are
      removed since we don't do lock_super() at all in generic_shutdown_super()
      now]
      [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6cfd0148
    • A
      No need to do lock_super() for exclusion in generic_shutdown_super() · a9e220f8
      Al Viro 提交于
      We can't run into contention on it.  All other callers of lock_super()
      either hold s_umount (and we have it exclusive) or hold an active
      reference to superblock in question, which prevents the call of
      generic_shutdown_super() while the reference is held.  So we can
      replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
      (and corresponding change for unlock_super(), of course).
      
      Since ext4 expects s_lock held for its put_super, take lock_super()
      into it.  The rest of filesystems do not care at all.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a9e220f8
    • A
      443b94ba
    • C
      cleanup sync_supers · e5004753
      Christoph Hellwig 提交于
      Merge the write_super helper into sync_super and move the check for
      ->write_super earlier so that we can avoid grabbing a reference to
      a superblock that doesn't have it.
      
      While we're at it also add a little comment documenting sync_supers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e5004753
    • C
      remove ->write_super call in generic_shutdown_super · 8c85e125
      Christoph Hellwig 提交于
      We just did a full fs writeout using sync_filesystem before, and if
      that's not enough for the filesystem it can perform it's own writeout
      in ->put_super, which many filesystems already do.
      
      Move a call to foofs_write_super into every foofs_put_super for now to
      guarantee identical behaviour until it's cleaned up by the individual
      filesystem maintainers.
      
      Exceptions:
      
       - affs already has identical copy & pasted code at the beginning of
         affs_put_super so no need to do it twice.
       - xfs does the right thing without it and I have changes pending for
         the xfs tree touching this are so I don't really need conflicts
         here..
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8c85e125
    • J
      vfs: Rename fsync_super() to sync_filesystem() (version 4) · 60b0680f
      Jan Kara 提交于
      Rename the function so that it better describe what it really does. Also
      remove the unnecessary include of buffer_head.h.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      60b0680f
    • J
      vfs: Move syncing code from super.c to sync.c (version 4) · c15c54f5
      Jan Kara 提交于
      Move sync_filesystems(), __fsync_super(), fsync_super() from
      super.c to sync.c where it fits better.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c15c54f5
    • J
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara 提交于
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
    • J
      vfs: Make __fsync_super() a static function (version 4) · 429479f0
      Jan Kara 提交于
      __fsync_super() does the same thing as fsync_super(). So change the only
      caller to use fsync_super() and make __fsync_super() static. This removes
      unnecessarily duplicated call to sync_blockdev() and prepares ground
      for the changes to __fsync_super() in the following patches.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      429479f0
    • J
      vfs: Call ->sync_fs() even if s_dirt is 0 (version 4) · bfe88125
      Jan Kara 提交于
      sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
      ->sync_fs() isn't called. This does not really make much sence since s_dirt is
      generally used by a filesystem to mean that ->write_super() needs to be called.
      But ->sync_fs() does different things. I even suspect that some filesystems
      (btrfs?) sets s_dirt just to fool this logic.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bfe88125
    • J
      vfs: Fix sys_sync() and fsync_super() reliability (version 4) · 5a3e5cb8
      Jan Kara 提交于
      So far, do_sync() called:
        sync_inodes(0);
        sync_supers();
        sync_filesystems(0);
        sync_filesystems(1);
        sync_inodes(1);
      
      This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
      submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
      transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
      not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
      and others are hit by this) when racing e.g. with background writeback. A
      similar problem hits also other filesystems (e.g. ext2) because of
      write_supers() being called before the sync_inodes(1).
      
      Change the ordering of calls in do_sync() - this requires a new function
      sync_blockdevs() to preserve the property that block devices are always synced
      after write_super() / sync_fs() call.
      
      The same issue is fixed in __fsync_super() function used on umount /
      remount read-only.
      
      [AV: build fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5a3e5cb8
    • C
      remove s_async_list · 876a9f76
      Christoph Hellwig 提交于
      Remove the unused s_async_list in the superblock, a leftover of the
      broken async inode deletion code that leaked into mainline.  Having this
      in the middle of the sync/unmount path is not helpful for the following
      cleanups.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      876a9f76
    • N
      fs: move mark_files_ro into file_table.c · 864d7c4c
      npiggin@suse.de 提交于
      This function walks the s_files lock, and operates primarily on the
      files in a superblock, so it better belongs here (eg. see also
      fs_may_remount_ro).
      
      [AV: ... and it shouldn't be static after that move]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      864d7c4c
  2. 09 5月, 2009 2 次提交
  3. 07 4月, 2009 1 次提交
  4. 03 4月, 2009 1 次提交
  5. 28 3月, 2009 1 次提交
  6. 26 3月, 2009 2 次提交
  7. 13 3月, 2009 1 次提交
  8. 19 2月, 2009 1 次提交
    • P
      fs/super.c: add lockdep annotation to s_umount · ada723dc
      Peter Zijlstra 提交于
      Li Zefan said:
      
      Thread 1:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            cat /mnt/cpus > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      Thread 2:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      (Note: It is irrelevant which cgroup subsys is used.)
      
      After a while a lockdep warning showed up:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28 #479
      ---------------------------------------------
      mount/13554 is trying to acquire lock:
       (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321
      
      but task is already holding lock:
       (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      other info that might help us debug this:
      1 lock held by mount/13554:
       #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      stack backtrace:
      Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
      Call Trace:
       [<c044ad2e>] validate_chain+0x4c6/0xbbd
       [<c044ba9b>] __lock_acquire+0x676/0x700
       [<c044bb82>] lock_acquire+0x5d/0x7a
       [<c049d888>] ? sget+0x5e/0x321
       [<c061b9b8>] down_write+0x34/0x50
       [<c049d888>] ? sget+0x5e/0x321
       [<c049d888>] sget+0x5e/0x321
       [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
       [<c045959f>] ? cgroup_test_super+0x0/0x2f
       [<c045bcea>] cgroup_get_sb+0x98/0x2e7
       [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
       [<c049dfa4>] vfs_kern_mount+0x40/0x7b
       [<c049e02d>] do_kern_mount+0x37/0xbf
       [<c04af4a0>] do_mount+0x5c3/0x61a
       [<c04addd2>] ? copy_mount_options+0x2c/0x111
       [<c04af560>] sys_mount+0x69/0xa0
       [<c0403251>] sysenter_do_call+0x12/0x31
      
      The cause is after alloc_super() and then retry, an old entry in list
      fs_supers is found, so grab_super(old) is called, but both functions hold
      s_umount lock:
      
      struct super_block *sget(...)
      {
      	...
      retry:
      	spin_lock(&sb_lock);
      	if (test) {
      		list_for_each_entry(old, &type->fs_supers, s_instances) {
      			if (!test(old, data))
      				continue;
      			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
      				goto retry;
      			if (s)
      				destroy_super(s);
      			return old;
      		}
      	}
      	if (!s) {
      		spin_unlock(&sb_lock);
      		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
      		if (!s)
      			return ERR_PTR(-ENOMEM);
      		goto retry;
      	}
      	...
      }
      
      It seems like a false positive, and seems like VFS but not cgroup needs to
      be fixed.
      
      Peter said:
      
      We can simply put the new s_umount instance in a but lockdep doesn't
      particularly cares about subclass order.
      
      If there's any issue with the callers of sget() assuming the s_umount lock
      being of sublcass 0, then there is another annotation we can use to fix
      that, but lets not bother with that if this is sufficient.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ada723dc
  9. 09 2月, 2009 1 次提交
  10. 14 1月, 2009 1 次提交
  11. 09 1月, 2009 1 次提交
  12. 08 1月, 2009 1 次提交
  13. 03 1月, 2009 1 次提交
  14. 20 12月, 2008 1 次提交
  15. 23 10月, 2008 2 次提交
  16. 21 10月, 2008 1 次提交
  17. 25 7月, 2008 1 次提交
    • K
      fix soft lock up at NFS mount via per-SB LRU-list of unused dentries · da3bbdd4
      Kentaro Makita 提交于
      [Summary]
      
       Split LRU-list of unused dentries to one per superblock to avoid soft
       lock up during NFS mounts and remounting of any filesystem.
      
       Previously I posted here:
       http://lkml.org/lkml/2008/3/5/590
      
      [Descriptions]
      
      - background
      
        dentry_unused is a list of dentries which are not referenced.
        dentry_unused grows up when references on directories or files are
        released.  This list can be very long if there is huge free memory.
      
      - the problem
      
        When shrink_dcache_sb() is called, it scans all dentry_unused linearly
        under spin_lock(), and if dentry->d_sb is differnt from given
        superblock, scan next dentry.  This scan costs very much if there are
        many entries, and very ineffective if there are many superblocks.
      
        IOW, When we need to shrink unused dentries on one dentry, but scans
        unused dentries on all superblocks in the system.  For example, we scan
        500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
        dentries on other superblocks.
      
        In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
        unused dentries on NFS, but scans 100,000,000 unused dentries on
        superblocks in the system such as local ext3 filesystems.  I hear NFS
        mounting took 1 min on some system in use.
      
      * : NFS uses virtual filesystem in rpc layer, so NFS is affected by
        this problem.
      
        100,000,000 is possible number on large systems.
      
        Per-superblock LRU of unused dentried can reduce the cost in
        reasonable manner.
      
      - How to fix
      
        I found this problem is solved by David Chinner's "Per-superblock
        unused dentry LRU lists V3"(1), so I rebase it and add some fix to
        reclaim with fairness, which is in Andrew Morton's comments(2).
      
        1) http://lkml.org/lkml/2006/5/25/318
        2) http://lkml.org/lkml/2006/5/25/320
      
        Split LRU-list of unused dentries to each superblocks.  Then, NFS
        mounting will check dentries under a superblock instead of all.  But
        this spliting will break LRU of dentry-unused.  So, I've attempted to
        make reclaim unused dentrins with fairness by calculate number of
        dentries to scan on this sb based on following way
      
        number of dentries to scan on this sb =
        count * (number of dentries on this sb / number of dentries in the machine)
      
      - ToDo
       - I have to measuring performance number and do stress tests.
      
       - When unmount occurs during prune_dcache(), scanning on same
        superblock, It is unable to reach next superblock because it is gone
        away.  We restart scannig superblock from first one, it causes
        unfairness of reclaim unused dentries on first superblock.  But I think
        this happens very rarely.
      
      - Test Results
      
        Result on 6GB boxes with excessive unused dentries.
      
      Without patch:
      
      $ cat /proc/sys/fs/dentry-state
      10181835        10180203        45      0       0       0
      # mount -t nfs 10.124.60.70:/work/kernel-src nfs
      real    0m1.830s
      user    0m0.001s
      sys     0m1.653s
      
       With this patch:
      $ cat /proc/sys/fs/dentry-state
      10236610        10234751        45      0       0       0
      # mount -t nfs 10.124.60.70:/work/kernel-src nfs
      real    0m0.106s
      user    0m0.002s
      sys     0m0.032s
      
      [akpm@linux-foundation.org: fix comments]
      Signed-off-by: NKentaro Makita <k-makita@np.css.fujitsu.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da3bbdd4
  18. 29 4月, 2008 1 次提交
  19. 28 4月, 2008 1 次提交
  20. 22 4月, 2008 1 次提交
  21. 19 4月, 2008 2 次提交
  22. 20 3月, 2008 1 次提交
    • R
      fs: fix kernel-doc notation warnings · a6b91919
      Randy Dunlap 提交于
      Fix kernel-doc notation warnings in fs/.
      
      Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
       *	mark_files_ro
      Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
       *	lease_get_mtime
      Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
       *	lease_get_mtime
      Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
       * lookup_one_len:  filesystem helper to lookup single pathname component
      Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
       * bh_uptodate_or_lock: Test whether the buffer is uptodate
      Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
       * bh_submit_read: Submit a locked buffer for reading
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
       * writeback_acquire: attempt to get exclusive writeback access to a device
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
       * writeback_in_progress: determine whether there is writeback in progress
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
       * writeback_release: relinquish exclusive writeback access against a device.
      Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
      Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
      Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
       * void journal_invalidatepage()
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6b91919
  23. 18 3月, 2008 1 次提交