1. 22 5月, 2010 9 次提交
  2. 30 4月, 2010 1 次提交
  3. 25 4月, 2010 1 次提交
    • J
      Catch filesystems lacking s_bdi · 5129a469
      Jörn Engel 提交于
      noop_backing_dev_info is used only as a flag to mark filesystems that
      don't have any backing store, like tmpfs, procfs, spufs, etc.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      
      Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
      to the noop_backing_dev_info is not legal and will not result in
      them being flushed, but we already catch this condition in
      __mark_inode_dirty() when checking for a registered bdi.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5129a469
  4. 04 3月, 2010 2 次提交
    • A
      Mirror MS_KERNMOUNT in ->mnt_flags · 8089352a
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8089352a
    • N
      fs: improve remount,ro vs buffercache coherency · d208bbdd
      Nick Piggin 提交于
      Invalidate sb->s_bdev on remount,ro.
      
      Fixes a problem reported by Jorge Boncompte who is seeing corruption
      trying to snapshot a minix filesystem image.  Some filesystems modify
      their metadata via a path other than the bdev buffer cache (eg.  they may
      use a private linear mapping for their metadata, or implement directories
      in pagecache, etc).  Also, file data modifications usually go to the bdev
      via their own mappings.
      
      These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
      and never have been.  However there could be a reasonable expectation that
      after a mount -oremount,ro operation then the buffercache should
      subsequently be coherent with previous filesystem modifications.
      
      So invalidate the bdev mappings on a remount,ro operation to provide a
      coherency point.
      
      The problem was exposed when we switched the old rd to brd because old rd
      didn't really function like a normal block device and updates to rd via
      mappings other than the buffercache would still end up going into its
      buffercache.  But the same problem has always affected other "normal"
      block devices, including loop.
      
      [akpm@linux-foundation.org: repair comment layout]
      Reported-by: N"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Tested-by: N"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d208bbdd
  5. 24 12月, 2009 1 次提交
  6. 24 9月, 2009 3 次提交
    • C
      freeze_bdev: grab active reference to frozen superblocks · 4504230a
      Christoph Hellwig 提交于
      Currently we held s_umount while a filesystem is frozen, despite that we
      might return to userspace and unlock it from a different process.  Instead
      grab an active reference to keep the file system busy and add an explicit
      check for frozen filesystems in remount and reject the remount instead
      of blocking on s_umount.
      
      Add a new get_active_super helper to super.c for use by freeze_bdev that
      grabs an active reference to a superblock from a given block device.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4504230a
    • C
      freeze_bdev: kill bd_mount_sem · 4fadd7bb
      Christoph Hellwig 提交于
      Now that we have the freeze count there is not much reason for bd_mount_sem
      anymore.  The actual freeze/thaw operations are serialized using the
      bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
      get_sb_bdev which tries to prevent mounting a filesystem while the block
      device is frozen.  Instead of add a check for bd_fsfreeze_count and
      return -EBUSY if a filesystem is frozen.  While that is a change in user
      visible behaviour a failing mount is much better for this case rather
      than having the mount process stuck uninterruptible for a long time.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4fadd7bb
    • J
      vfs: change sb->s_maxbytes to a loff_t · 42cb56ae
      Jeff Layton 提交于
      sb->s_maxbytes is supposed to indicate the maximum size of a file that can
      exist on the filesystem.  It's declared as an unsigned long long.
      
      Even if a filesystem has no inherent limit that prevents it from using
      every bit in that unsigned long long, it's still problematic to set it to
      anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
      that cast s_maxbytes to a signed value.  If it's set too large then this
      cast makes it a negative number and generally breaks the comparison.
      
      Change s_maxbytes to be loff_t instead.  That should help eliminate the
      temptation to set it too large by making it a signed value.
      
      Also, add a warning for couple of releases to help catch filesystems that
      set s_maxbytes too large.  Eventually we can either convert this to a
      BUG() or just remove it and in the hope that no one will get it wrong now
      that it's a signed value.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Cc: Mandeep Singh Baines <msb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      42cb56ae
  7. 22 9月, 2009 1 次提交
  8. 16 9月, 2009 1 次提交
  9. 11 9月, 2009 2 次提交
    • J
      writeback: switch to per-bdi threads for flushing data · 03ba3782
      Jens Axboe 提交于
      This gets rid of pdflush for bdi writeout and kupdated style cleaning.
      pdflush writeout suffers from lack of locality and also requires more
      threads to handle the same workload, since it has to work in a
      non-blocking fashion against each queue. This also introduces lumpy
      behaviour and potential request starvation, since pdflush can be starved
      for queue access if others are accessing it. A sample ffsb workload that
      does random writes to files is about 8% faster here on a simple SATA drive
      during the benchmark phase. File layout also seems a LOT more smooth in
      vmstat:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
       0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
       0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
       1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
       0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
       0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
       0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
       0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
       0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
       0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
       0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45
      
      where vanilla tends to fluctuate a lot in the creation phase:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
       1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
       1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
       0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
       0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
       1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
       0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
       0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
       1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
       0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
       1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
       1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
       0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54
      
      A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
      SSD based writeback test on XFS performs over 20% better as well, with
      the throughput being very stable around 1GB/sec, where pdflush only
      manages 750MB/sec and fluctuates wildly while doing so. Random buffered
      writes to many files behave a lot better as well, as does random mmap'ed
      writes.
      
      A separate thread is added to sync the super blocks. In the long term,
      adding sync_supers_bdi() functionality could get rid of this thread again.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      03ba3782
    • J
      writeback: move dirty inodes from super_block to backing_dev_info · 66f3b8e2
      Jens Axboe 提交于
      This is a first step at introducing per-bdi flusher threads. We should
      have no change in behaviour, although sb_has_dirty_inodes() is now
      ridiculously expensive, as there's no easy way to answer that question.
      Not a huge problem, since it'll be deleted in subsequent patches.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      66f3b8e2
  10. 24 6月, 2009 2 次提交
    • A
      ... and the same for vfsmount id/mount group id · f21f6220
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f21f6220
    • A
      Make allocation of anon devices cheaper · c63e09ec
      Al Viro 提交于
      Standard trick - add a new variable (start) such that
      for each n < start n is known to be busy.  Allocation can
      skip checking everything in [0..start) and if it returns
      n, we can set start to n + 1.  Freeing below start sets
      start to what we'd just freed.
      
      Of course, it still sucks if we do something like
      	free 0
      	allocate
      	allocate
      in a loop - still O(n^2) time.  However, on saner loads it
      improves the things a lot and the entire thing is not worth
      the trouble of switching to something with better worst-case
      behaviour.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c63e09ec
  11. 17 6月, 2009 1 次提交
  12. 12 6月, 2009 16 次提交
    • A
      Push BKL down into ->remount_fs() · 337eb00a
      Alessio Igor Bogani 提交于
      [xfs, btrfs, capifs, shmem don't need BKL, exempt]
      Signed-off-by: NAlessio Igor Bogani <abogani@texware.it>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      337eb00a
    • C
      ->write_super lock_super pushdown · ebc1ac16
      Christoph Hellwig 提交于
      Push down lock_super into ->write_super instances and remove it from the
      caller.
      
      Following filesystem don't need ->s_lock in ->write_super and are skipped:
      
       * bfs, nilfs2 - no other uses of s_lock and have internal locks in
      	->write_super
       * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
       * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
       	->write_super
       * xfs - no other uses of s_lock and uses internal lock (buffer lock on
      	superblock buffer) to serialize ->write_super.  Also xfs_fs_write_super
      	is superflous and will go away in the next merge window
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ebc1ac16
    • A
      Push BKL down into do_remount_sb() · 4aa98cf7
      Al Viro 提交于
      [folded fix from Jiri Slaby]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4aa98cf7
    • A
      Push lock_super() into the ->remount_fs() of filesystems that care about it · bbd6851a
      Al Viro 提交于
      Note that since we can't run into contention between remount_fs and write_super
      (due to exclusion on s_umount), we have to care only about filesystems that
      touch lock_super() on their own.  Out of those ext3, ext4, hpfs, sysv and ufs
      do need it; fat doesn't since its ->remount_fs() only accesses assign-once
      data (basically, it's "we have no atime on directories and only have atime on
      files for vfat; force nodiratime and possibly noatime into *flags").
      
      [folded a build fix from hch]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bbd6851a
    • C
      push BKL down into ->put_super · 6cfd0148
      Christoph Hellwig 提交于
      Move BKL into ->put_super from the only caller.  A couple of
      filesystems had trivial enough ->put_super (only kfree and NULLing of
      s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
      hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
      of them probably don't need it, but I'd rather sort that out individually.
      Preferably after all the other BKL pushdowns in that area.
      
      [AV: original used to move lock_super() down as well; these changes are
      removed since we don't do lock_super() at all in generic_shutdown_super()
      now]
      [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6cfd0148
    • A
      No need to do lock_super() for exclusion in generic_shutdown_super() · a9e220f8
      Al Viro 提交于
      We can't run into contention on it.  All other callers of lock_super()
      either hold s_umount (and we have it exclusive) or hold an active
      reference to superblock in question, which prevents the call of
      generic_shutdown_super() while the reference is held.  So we can
      replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
      (and corresponding change for unlock_super(), of course).
      
      Since ext4 expects s_lock held for its put_super, take lock_super()
      into it.  The rest of filesystems do not care at all.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a9e220f8
    • A
      443b94ba
    • C
      cleanup sync_supers · e5004753
      Christoph Hellwig 提交于
      Merge the write_super helper into sync_super and move the check for
      ->write_super earlier so that we can avoid grabbing a reference to
      a superblock that doesn't have it.
      
      While we're at it also add a little comment documenting sync_supers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e5004753
    • C
      remove ->write_super call in generic_shutdown_super · 8c85e125
      Christoph Hellwig 提交于
      We just did a full fs writeout using sync_filesystem before, and if
      that's not enough for the filesystem it can perform it's own writeout
      in ->put_super, which many filesystems already do.
      
      Move a call to foofs_write_super into every foofs_put_super for now to
      guarantee identical behaviour until it's cleaned up by the individual
      filesystem maintainers.
      
      Exceptions:
      
       - affs already has identical copy & pasted code at the beginning of
         affs_put_super so no need to do it twice.
       - xfs does the right thing without it and I have changes pending for
         the xfs tree touching this are so I don't really need conflicts
         here..
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8c85e125
    • J
      vfs: Rename fsync_super() to sync_filesystem() (version 4) · 60b0680f
      Jan Kara 提交于
      Rename the function so that it better describe what it really does. Also
      remove the unnecessary include of buffer_head.h.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      60b0680f
    • J
      vfs: Move syncing code from super.c to sync.c (version 4) · c15c54f5
      Jan Kara 提交于
      Move sync_filesystems(), __fsync_super(), fsync_super() from
      super.c to sync.c where it fits better.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c15c54f5
    • J
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara 提交于
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
    • J
      vfs: Make __fsync_super() a static function (version 4) · 429479f0
      Jan Kara 提交于
      __fsync_super() does the same thing as fsync_super(). So change the only
      caller to use fsync_super() and make __fsync_super() static. This removes
      unnecessarily duplicated call to sync_blockdev() and prepares ground
      for the changes to __fsync_super() in the following patches.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      429479f0
    • J
      vfs: Call ->sync_fs() even if s_dirt is 0 (version 4) · bfe88125
      Jan Kara 提交于
      sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
      ->sync_fs() isn't called. This does not really make much sence since s_dirt is
      generally used by a filesystem to mean that ->write_super() needs to be called.
      But ->sync_fs() does different things. I even suspect that some filesystems
      (btrfs?) sets s_dirt just to fool this logic.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bfe88125
    • J
      vfs: Fix sys_sync() and fsync_super() reliability (version 4) · 5a3e5cb8
      Jan Kara 提交于
      So far, do_sync() called:
        sync_inodes(0);
        sync_supers();
        sync_filesystems(0);
        sync_filesystems(1);
        sync_inodes(1);
      
      This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
      submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
      transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
      not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
      and others are hit by this) when racing e.g. with background writeback. A
      similar problem hits also other filesystems (e.g. ext2) because of
      write_supers() being called before the sync_inodes(1).
      
      Change the ordering of calls in do_sync() - this requires a new function
      sync_blockdevs() to preserve the property that block devices are always synced
      after write_super() / sync_fs() call.
      
      The same issue is fixed in __fsync_super() function used on umount /
      remount read-only.
      
      [AV: build fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5a3e5cb8
    • C
      remove s_async_list · 876a9f76
      Christoph Hellwig 提交于
      Remove the unused s_async_list in the superblock, a leftover of the
      broken async inode deletion code that leaked into mainline.  Having this
      in the middle of the sync/unmount path is not helpful for the following
      cleanups.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      876a9f76