1. 20 11月, 2014 1 次提交
  2. 05 9月, 2014 1 次提交
  3. 22 2月, 2014 1 次提交
    • J
      Revert "writeback: do not sync data dirtied after sync start" · 0dc83bd3
      Jan Kara 提交于
      This reverts commit c4a391b5. Dave
      Chinner <david@fromorbit.com> has reported the commit may cause some
      inodes to be left out from sync(2). This is because we can call
      redirty_tail() for some inode (which sets i_dirtied_when to current time)
      after sync(2) has started or similarly requeue_inode() can set
      i_dirtied_when to current time if writeback had to skip some pages. The
      real problem is in the functions clobbering i_dirtied_when but fixing
      that isn't trivial so revert is a safer choice for now.
      
      CC: stable@vger.kernel.org # >= 3.13
      Signed-off-by: NJan Kara <jack@suse.cz>
      0dc83bd3
  4. 10 2月, 2014 1 次提交
    • A
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro 提交于
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d311d79d
  5. 13 11月, 2013 1 次提交
    • J
      writeback: do not sync data dirtied after sync start · c4a391b5
      Jan Kara 提交于
      When there are processes heavily creating small files while sync(2) is
      running, it can easily happen that quite some new files are created
      between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2).  That can happen
      especially if there are several busy filesystems (remember that sync
      traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one
      fs before starting it on another fs).  Because WB_SYNC_ALL pass is slow
      (e.g.  causes a transaction commit and cache flush for each inode in
      ext3), resulting sync(2) times are rather large.
      
      The following script reproduces the problem:
      
        function run_writers
        {
          for (( i = 0; i < 10; i++ )); do
            mkdir $1/dir$i
            for (( j = 0; j < 40000; j++ )); do
              dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
            done &
          done
        }
      
        for dir in "$@"; do
          run_writers $dir
        done
      
        sleep 40
        time sync
      
      Fix the problem by disregarding inodes dirtied after sync(2) was called
      in the WB_SYNC_ALL pass.  To allow for this, sync_inodes_sb() now takes
      a time stamp when sync has started which is used for setting up work for
      flusher threads.
      
      To give some numbers, when above script is run on two ext4 filesystems
      on simple SATA drive, the average sync time from 10 runs is 267.549
      seconds with standard deviation 104.799426.  With the patched kernel,
      the average sync time from 10 runs is 2.995 seconds with standard
      deviation 0.096.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4a391b5
  6. 25 10月, 2013 1 次提交
  7. 04 3月, 2013 1 次提交
  8. 23 2月, 2013 1 次提交
  9. 27 9月, 2012 1 次提交
  10. 23 7月, 2012 7 次提交
    • J
      vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes · 4ea425b6
      Jan Kara 提交于
      wakeup_flusher_threads(0) will queue work doing complete writeback for each
      flusher thread. Thus there is not much point in submitting another work doing
      full inode WB_SYNC_NONE writeback by writeback_inodes_sb().
      
      After this change it does not make sense to call nonblocking ->sync_fs and
      block device flush before calling sync_inodes_sb() because
      wakeup_flusher_threads() is completely asynchronous and thus these functions
      would be called in parallel with inode writeback running which will effectively
      void any work they do. So we move sync_inodes_sb() call before these two
      functions.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ea425b6
    • J
      vfs: Remove unnecessary flushing of block devices · d0e91b13
      Jan Kara 提交于
      It is not necessary to write block devices twice. The reason why we first did
      flush and then proper sync is that
        for_each_bdev() {
          write_bdev()
          wait_for_completion()
        }
      is much slower than
        for_each_bdev()
          write_bdev()
        for_each_bdev()
          wait_for_completion()
      when there is bigger amount of data. But as is seen in the above, there's no real
      need to scan pages and submit them twice. We just need to separate the submission
      and waiting part. This patch does that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d0e91b13
    • J
      vfs: Make sys_sync writeout also block device inodes · a8c7176b
      Jan Kara 提交于
      In case block device does not have filesystem mounted on it, sys_sync will just
      ignore it and doesn't writeout its dirty pages. This is because writeback code
      avoids writing inodes from superblock without backing device and
      blockdev_superblock is such a superblock.  Since it's unexpected that sync
      doesn't writeout dirty data for block devices be nice to users and change the
      behavior to do so. So now we iterate over all block devices on blockdev_super
      instead of iterating over all superblocks when syncing block devices.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8c7176b
    • J
      vfs: Reorder operations during sys_sync · b3de6531
      Jan Kara 提交于
      Change the order of operations during sync from
      
      for_each_sb {
              writeback_inodes_sb();
              sync_fs(nowait);
              __sync_blockdev(nowait);
      }
      for_each_sb {
              sync_inodes_sb();
              sync_fs(wait);
              __sync_blockdev(wait);
      }
      
      to
      
      for_each_sb
              writeback_inodes_sb();
      for_each_sb
              sync_fs(nowait);
      for_each_sb
              __sync_blockdev(nowait);
      for_each_sb
              sync_inodes_sb();
      for_each_sb
              sync_fs(wait);
      for_each_sb
              __sync_blockdev(wait);
      
      This is a preparation for the following patches in this series.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b3de6531
    • J
      quota: Move quota syncing to ->sync_fs method · a1177825
      Jan Kara 提交于
      Since the moment writes to quota files are using block device page cache and
      space for quota structures is reserved at the moment they are first accessed we
      have no reason to sync quota before inode writeback. In fact this order is now
      only harmful since quota information can easily change during inode writeback
      (either because conversion of delayed-allocated extents or simply because of
      allocation of new blocks for simple filesystems not using page_mkwrite).
      
      So move syncing of quota information after writeback of inodes into ->sync_fs
      method. This way we do not have to use ->quota_sync callback which is primarily
      intended for use by quotactl syscall anyway and we get rid of calling
      ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
      proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
      also support legacy non-journalled quotas) and thus there are no dirty quota
      structures.
      
      CC: "Theodore Ts'o" <tytso@mit.edu>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: reiserfs-devel@vger.kernel.org
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: NDave Kleikamp <shaggy@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a1177825
    • J
      quota: Split dquot_quota_sync() to writeback and cache flushing part · ceed1723
      Jan Kara 提交于
      Split off part of dquot_quota_sync() which writes dquots into a quota file
      to a separate function. In the next patch we will use the function from
      filesystems and we do not want to abuse ->quota_sync quotactl callback more
      than necessary.
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ceed1723
    • J
      vfs: Move noop_backing_dev_info check from sync into writeback · 6eedc701
      Jan Kara 提交于
      In principle, a filesystem may want to have ->sync_fs() called during sync(1)
      although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
      Only writeback code really needs bdi set to something reasonable. So move the
      checks where they are more logical.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6eedc701
  11. 09 6月, 2012 1 次提交
  12. 30 5月, 2012 1 次提交
  13. 29 2月, 2012 1 次提交
  14. 04 1月, 2012 1 次提交
  15. 31 10月, 2011 1 次提交
    • C
      writeback: Add a 'reason' to wb_writeback_work · 0e175a18
      Curt Wohlgemuth 提交于
      This creates a new 'reason' field in a wb_writeback_work
      structure, which unambiguously identifies who initiates
      writeback activity.  A 'wb_reason' enumeration has been
      added to writeback.h, to enumerate the possible reasons.
      
      The 'writeback_work_class' and tracepoint event class and
      'writeback_queue_io' tracepoints are updated to include the
      symbolic 'reason' in all trace events.
      
      And the 'writeback_inodes_sbXXX' family of routines has had
      a wb_stats parameter added to them, so callers can specify
      why writeback is being started.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      0e175a18
  16. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  17. 21 3月, 2011 1 次提交
    • S
      introduce sys_syncfs to sync a single file system · b7ed78f5
      Sage Weil 提交于
      It is frequently useful to sync a single file system, instead of all
      mounted file systems via sync(2):
      
       - On machines with many mounts, it is not at all uncommon for some of
         them to hang (e.g. unresponsive NFS server).  sync(2) will get stuck on
         those and may never get to the one you do care about (e.g., /).
       - Some applications write lots of data to the file system and then
         want to make sure it is flushed to disk.  Calling fsync(2) on each
         file introduces unnecessary ordering constraints that result in a large
         amount of sub-optimal writeback/flush/commit behavior by the file
         system.
      
      There are currently two ways (that I know of) to sync a single super_block:
      
       - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
         mapping, which isn't usually desirable, and doesn't work for non-block
         file systems.
       - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
         current implemention.  Relying on this little-known side effect for
         something like data safety sounds foolish.
      
      Both of these approaches require root privileges, which some applications
      do not have (nor should they need?) given that sync(2) is an unprivileged
      operation.
      
      This patch introduces a new system call syncfs(2) that takes an fd and
      syncs only the file system it references.  Maybe someday we can
      
       $ sync /some/path
      
      and not get
      
       sync: ignoring all arguments
      
      The syscall is motivated by comments by Al and Christoph at the last LSF.
      syncfs(2) seems like an appropriate name given statfs(2).
      
      A similar ioctl was also proposed a while back, see
      	http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b7ed78f5
  18. 17 3月, 2011 1 次提交
  19. 10 8月, 2010 1 次提交
  20. 01 6月, 2010 1 次提交
  21. 28 5月, 2010 1 次提交
  22. 22 5月, 2010 4 次提交
  23. 17 5月, 2010 1 次提交
    • J
      writeback: fix WB_SYNC_NONE writeback from umount · e913fc82
      Jens Axboe 提交于
      When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
      writeback to kick off writeback of pending dirty inodes, then follow
      that up with a WB_SYNC_ALL to wait for it. Since umount already holds
      the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
      writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
      since WB_SYNC_ALL writeback is a data integrity operation and thus
      a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
      it's a lot slower.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e913fc82
  24. 25 4月, 2010 1 次提交
    • J
      Catch filesystems lacking s_bdi · 5129a469
      Jörn Engel 提交于
      noop_backing_dev_info is used only as a flag to mark filesystems that
      don't have any backing store, like tmpfs, procfs, spufs, etc.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      
      Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
      to the noop_backing_dev_info is not legal and will not result in
      them being flushed, but we already catch this condition in
      __mark_inode_dirty() when checking for a registered bdi.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5129a469
  25. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  26. 05 3月, 2010 1 次提交
  27. 18 12月, 2009 1 次提交
  28. 10 12月, 2009 2 次提交
    • C
      kill wait_on_page_writeback_range · 94004ed7
      Christoph Hellwig 提交于
      All callers really want the more logical filemap_fdatawait_range interface,
      so convert them to use it and merge wait_on_page_writeback_range into
      filemap_fdatawait_range.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      94004ed7
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  29. 23 9月, 2009 1 次提交
  30. 16 9月, 2009 1 次提交
新手
引导
客服 返回
顶部