1. 12 6月, 2009 7 次提交
    • J
      vfs: Move syncing code from super.c to sync.c (version 4) · c15c54f5
      Jan Kara 提交于
      Move sync_filesystems(), __fsync_super(), fsync_super() from
      super.c to sync.c where it fits better.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c15c54f5
    • J
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara 提交于
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
    • J
      vfs: Make __fsync_super() a static function (version 4) · 429479f0
      Jan Kara 提交于
      __fsync_super() does the same thing as fsync_super(). So change the only
      caller to use fsync_super() and make __fsync_super() static. This removes
      unnecessarily duplicated call to sync_blockdev() and prepares ground
      for the changes to __fsync_super() in the following patches.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      429479f0
    • J
      vfs: Call ->sync_fs() even if s_dirt is 0 (version 4) · bfe88125
      Jan Kara 提交于
      sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
      ->sync_fs() isn't called. This does not really make much sence since s_dirt is
      generally used by a filesystem to mean that ->write_super() needs to be called.
      But ->sync_fs() does different things. I even suspect that some filesystems
      (btrfs?) sets s_dirt just to fool this logic.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bfe88125
    • J
      vfs: Fix sys_sync() and fsync_super() reliability (version 4) · 5a3e5cb8
      Jan Kara 提交于
      So far, do_sync() called:
        sync_inodes(0);
        sync_supers();
        sync_filesystems(0);
        sync_filesystems(1);
        sync_inodes(1);
      
      This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
      submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
      transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
      not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
      and others are hit by this) when racing e.g. with background writeback. A
      similar problem hits also other filesystems (e.g. ext2) because of
      write_supers() being called before the sync_inodes(1).
      
      Change the ordering of calls in do_sync() - this requires a new function
      sync_blockdevs() to preserve the property that block devices are always synced
      after write_super() / sync_fs() call.
      
      The same issue is fixed in __fsync_super() function used on umount /
      remount read-only.
      
      [AV: build fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5a3e5cb8
    • C
      remove s_async_list · 876a9f76
      Christoph Hellwig 提交于
      Remove the unused s_async_list in the superblock, a leftover of the
      broken async inode deletion code that leaked into mainline.  Having this
      in the middle of the sync/unmount path is not helpful for the following
      cleanups.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      876a9f76
    • N
      fs: move mark_files_ro into file_table.c · 864d7c4c
      npiggin@suse.de 提交于
      This function walks the s_files lock, and operates primarily on the
      files in a superblock, so it better belongs here (eg. see also
      fs_may_remount_ro).
      
      [AV: ... and it shouldn't be static after that move]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      864d7c4c
  2. 09 5月, 2009 2 次提交
  3. 07 4月, 2009 1 次提交
  4. 03 4月, 2009 1 次提交
  5. 28 3月, 2009 1 次提交
  6. 26 3月, 2009 2 次提交
  7. 13 3月, 2009 1 次提交
  8. 19 2月, 2009 1 次提交
    • P
      fs/super.c: add lockdep annotation to s_umount · ada723dc
      Peter Zijlstra 提交于
      Li Zefan said:
      
      Thread 1:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            cat /mnt/cpus > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      Thread 2:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      (Note: It is irrelevant which cgroup subsys is used.)
      
      After a while a lockdep warning showed up:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28 #479
      ---------------------------------------------
      mount/13554 is trying to acquire lock:
       (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321
      
      but task is already holding lock:
       (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      other info that might help us debug this:
      1 lock held by mount/13554:
       #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      stack backtrace:
      Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
      Call Trace:
       [<c044ad2e>] validate_chain+0x4c6/0xbbd
       [<c044ba9b>] __lock_acquire+0x676/0x700
       [<c044bb82>] lock_acquire+0x5d/0x7a
       [<c049d888>] ? sget+0x5e/0x321
       [<c061b9b8>] down_write+0x34/0x50
       [<c049d888>] ? sget+0x5e/0x321
       [<c049d888>] sget+0x5e/0x321
       [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
       [<c045959f>] ? cgroup_test_super+0x0/0x2f
       [<c045bcea>] cgroup_get_sb+0x98/0x2e7
       [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
       [<c049dfa4>] vfs_kern_mount+0x40/0x7b
       [<c049e02d>] do_kern_mount+0x37/0xbf
       [<c04af4a0>] do_mount+0x5c3/0x61a
       [<c04addd2>] ? copy_mount_options+0x2c/0x111
       [<c04af560>] sys_mount+0x69/0xa0
       [<c0403251>] sysenter_do_call+0x12/0x31
      
      The cause is after alloc_super() and then retry, an old entry in list
      fs_supers is found, so grab_super(old) is called, but both functions hold
      s_umount lock:
      
      struct super_block *sget(...)
      {
      	...
      retry:
      	spin_lock(&sb_lock);
      	if (test) {
      		list_for_each_entry(old, &type->fs_supers, s_instances) {
      			if (!test(old, data))
      				continue;
      			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
      				goto retry;
      			if (s)
      				destroy_super(s);
      			return old;
      		}
      	}
      	if (!s) {
      		spin_unlock(&sb_lock);
      		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
      		if (!s)
      			return ERR_PTR(-ENOMEM);
      		goto retry;
      	}
      	...
      }
      
      It seems like a false positive, and seems like VFS but not cgroup needs to
      be fixed.
      
      Peter said:
      
      We can simply put the new s_umount instance in a but lockdep doesn't
      particularly cares about subclass order.
      
      If there's any issue with the callers of sget() assuming the s_umount lock
      being of sublcass 0, then there is another annotation we can use to fix
      that, but lets not bother with that if this is sufficient.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ada723dc
  9. 09 2月, 2009 1 次提交
  10. 14 1月, 2009 1 次提交
  11. 09 1月, 2009 1 次提交
  12. 08 1月, 2009 1 次提交
  13. 03 1月, 2009 1 次提交
  14. 20 12月, 2008 1 次提交
  15. 23 10月, 2008 2 次提交
  16. 21 10月, 2008 1 次提交
  17. 25 7月, 2008 1 次提交
    • K
      fix soft lock up at NFS mount via per-SB LRU-list of unused dentries · da3bbdd4
      Kentaro Makita 提交于
      [Summary]
      
       Split LRU-list of unused dentries to one per superblock to avoid soft
       lock up during NFS mounts and remounting of any filesystem.
      
       Previously I posted here:
       http://lkml.org/lkml/2008/3/5/590
      
      [Descriptions]
      
      - background
      
        dentry_unused is a list of dentries which are not referenced.
        dentry_unused grows up when references on directories or files are
        released.  This list can be very long if there is huge free memory.
      
      - the problem
      
        When shrink_dcache_sb() is called, it scans all dentry_unused linearly
        under spin_lock(), and if dentry->d_sb is differnt from given
        superblock, scan next dentry.  This scan costs very much if there are
        many entries, and very ineffective if there are many superblocks.
      
        IOW, When we need to shrink unused dentries on one dentry, but scans
        unused dentries on all superblocks in the system.  For example, we scan
        500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
        dentries on other superblocks.
      
        In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
        unused dentries on NFS, but scans 100,000,000 unused dentries on
        superblocks in the system such as local ext3 filesystems.  I hear NFS
        mounting took 1 min on some system in use.
      
      * : NFS uses virtual filesystem in rpc layer, so NFS is affected by
        this problem.
      
        100,000,000 is possible number on large systems.
      
        Per-superblock LRU of unused dentried can reduce the cost in
        reasonable manner.
      
      - How to fix
      
        I found this problem is solved by David Chinner's "Per-superblock
        unused dentry LRU lists V3"(1), so I rebase it and add some fix to
        reclaim with fairness, which is in Andrew Morton's comments(2).
      
        1) http://lkml.org/lkml/2006/5/25/318
        2) http://lkml.org/lkml/2006/5/25/320
      
        Split LRU-list of unused dentries to each superblocks.  Then, NFS
        mounting will check dentries under a superblock instead of all.  But
        this spliting will break LRU of dentry-unused.  So, I've attempted to
        make reclaim unused dentrins with fairness by calculate number of
        dentries to scan on this sb based on following way
      
        number of dentries to scan on this sb =
        count * (number of dentries on this sb / number of dentries in the machine)
      
      - ToDo
       - I have to measuring performance number and do stress tests.
      
       - When unmount occurs during prune_dcache(), scanning on same
        superblock, It is unable to reach next superblock because it is gone
        away.  We restart scannig superblock from first one, it causes
        unfairness of reclaim unused dentries on first superblock.  But I think
        this happens very rarely.
      
      - Test Results
      
        Result on 6GB boxes with excessive unused dentries.
      
      Without patch:
      
      $ cat /proc/sys/fs/dentry-state
      10181835        10180203        45      0       0       0
      # mount -t nfs 10.124.60.70:/work/kernel-src nfs
      real    0m1.830s
      user    0m0.001s
      sys     0m1.653s
      
       With this patch:
      $ cat /proc/sys/fs/dentry-state
      10236610        10234751        45      0       0       0
      # mount -t nfs 10.124.60.70:/work/kernel-src nfs
      real    0m0.106s
      user    0m0.002s
      sys     0m0.032s
      
      [akpm@linux-foundation.org: fix comments]
      Signed-off-by: NKentaro Makita <k-makita@np.css.fujitsu.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da3bbdd4
  18. 29 4月, 2008 1 次提交
  19. 28 4月, 2008 1 次提交
  20. 22 4月, 2008 1 次提交
  21. 19 4月, 2008 2 次提交
  22. 20 3月, 2008 1 次提交
    • R
      fs: fix kernel-doc notation warnings · a6b91919
      Randy Dunlap 提交于
      Fix kernel-doc notation warnings in fs/.
      
      Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
       *	mark_files_ro
      Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
       *	lease_get_mtime
      Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
       *	lease_get_mtime
      Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
       * lookup_one_len:  filesystem helper to lookup single pathname component
      Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
       * bh_uptodate_or_lock: Test whether the buffer is uptodate
      Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
       * bh_submit_read: Submit a locked buffer for reading
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
       * writeback_acquire: attempt to get exclusive writeback access to a device
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
       * writeback_in_progress: determine whether there is writeback in progress
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
       * writeback_release: relinquish exclusive writeback access against a device.
      Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
      Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
      Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
       * void journal_invalidatepage()
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6b91919
  23. 18 3月, 2008 1 次提交
  24. 06 3月, 2008 1 次提交
  25. 09 2月, 2008 2 次提交
  26. 20 10月, 2007 4 次提交