1. 09 10月, 2009 2 次提交
    • D
      xfs: mark inodes dirty before issuing I/O · 932640e8
      Dave Chinner 提交于
      To make sure they get properly waited on in sync when I/O is in flight and
      we latter need to update the inode size.  Requires a new helper to check if an
      ioend structure is beyond the current EOF.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      932640e8
    • C
      xfs: implement ->dirty_inode to fix timestamp handling · f9581b14
      Christoph Hellwig 提交于
      This is picking up on Felix's repost of Dave's patch to implement a
      .dirty_inode method.  We really need this notification because
      the VFS keeps writing directly into the inode structure instead
      of going through methods to update this state.  In addition to
      the long-known atime issue we now also have a caller in VM code
      that updates c/mtime that way for shared writeable mmaps.  And
      I found another one that no one has noticed in practice in the FIFO
      code.
      
      So implement ->dirty_inode to set i_update_core whenever the
      inode gets externally dirtied, and switch the c/mtime handling to
      the same scheme we already use for atime (always picking up
      the value from the Linux inode).
      
      Note that this patch also removes the xfs_synchronize_atime call
      in xfs_reclaim it was superflous as we already synchronize the time
      when writing the inode via the log (xfs_inode_item_format) or the
      normal buffers (xfs_iflush_int).
      
      In addition also remove the I_CLEAR check before copying the Linux
      timestamps - now that we always have the Linux inode available
      we can always use the timestamps in it.
      
      Also switch to just using file_update_time for regular reads/writes -
      that will get us all optimization done to it for free and make
      sure we notice early when it breaks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      f9581b14
  2. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  3. 02 9月, 2009 1 次提交
    • C
      xfs: merge fsync and O_SYNC handling · 13e6d5cd
      Christoph Hellwig 提交于
      The guarantees for O_SYNC are exactly the same as the ones we need to
      make for an fsync call (and given that Linux O_SYNC is O_DSYNC the
      equivalent is fdadatasync, but we treat both the same in XFS), except
      with a range data writeout.  Jan Kara has started unifying these two
      path for filesystems using the generic helpers, and I've started to
      look at XFS.
      
      The actual transaction commited by xfs_fsync and xfs_write_sync_logforce
      has a different transaction number, but actually is exactly the same.
      We'll only use the fsync transaction going forward.  One major difference
      is that xfs_write_sync_logforce never issues a cache flush unless we
      commit a transaction causing that as a side-effect, which is an obvious
      bug in the O_SYNC handling.  Second all the locking and i_update_size
      vs i_update_core changes from 978b7237
      never made it to xfs_write_sync_logforce, so we add them back.
      
      To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait
      call is moved up to xfs_file_fsync, so that we don't wait on the whole
      file after we already waited for our portion in xfs_write.
      
      We'll also use a plain call to filemap_write_and_wait_range instead
      of the previous sync_page_rang which did it in two steps including
      an half-hearted inode write out that doesn't help us.
      
      Once we're done with this also remove the now useless i_update_size
      tracking.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      13e6d5cd
  4. 31 7月, 2009 2 次提交
  5. 07 4月, 2009 1 次提交
  6. 29 3月, 2009 1 次提交
    • H
      xfs: pagecache usage optimization · bddaafa1
      Hisashi Hifumi 提交于
      Hi.
      
      I introduced "is_partially_uptodate" aops for XFS.
      
      A page can have multiple buffers and even if a page is not uptodate,
      some buffers can be uptodate on pagesize != blocksize environment.
      
      This aops checks that all buffers which correspond to a part of a file
      that we want to read are uptodate. If so, we do not have to issue actual
      read IO to HDD even if a page is not uptodate because the portion we
      want to read are uptodate.
      
      "block_is_partially_uptodate" function is already used by ext2/3/4.
      With the following patch random read/write mixed workloads or random read
      after random write workloads can be optimized and we can get performance
      improvement.
      
      I did a performance test using the sysbench.
      
      #sysbench --num-threads=4 --max-requests=100000 --test=fileio --file-num=1 \
      --file-block-size=8K --file-total-size=1G --file-test-mode=rndrw \
      --file-fsync-freq=0 --file-rw-ratio=0.5 run
      
      -2.6.29-rc6
      Test execution summary:
          total time:                          123.8645s
          total number of events:              100000
          total time taken by event execution: 442.4994
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0044s
               max:                            0.3387s
               approx.  95 percentile:         0.0118s
      
      -2.6.29-rc6-patched
      Test execution summary:
          total time:                          108.0757s
          total number of events:              100000
          total time taken by event execution: 417.7505
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0042s
               max:                            0.3217s
               approx.  95 percentile:         0.0118s
      
      arch: ia64
      pagesize: 16k
      blocksize: 4k
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      bddaafa1
  7. 04 12月, 2008 3 次提交
  8. 30 10月, 2008 1 次提交
  9. 17 9月, 2008 1 次提交
    • L
      [XFS] Prevent direct I/O from mapping extents beyond eof · 364f358a
      Lachlan McIlroy 提交于
      With the help from some tracing I found that we try to map extents beyond
      eof when doing a direct I/O read. It appears that the way to inform the
      generic direct I/O path (ie do_direct_IO()) that we have breached eof is
      to return an unmapped buffer from xfs_get_blocks_direct(). This will cause
      do_direct_IO() to jump to the hole handling code where is will check for
      eof and then abort.
      
      This problem was found because a direct I/O read was trying to map beyond
      eof and was encountering delayed allocations. The delayed allocations
      beyond eof are speculative allocations and they didn't get converted when
      the direct I/O flushed the file because there was only enough space in the
      current AG to convert and write out the dirty pages within eof. Note that
      xfs_iomap_write_allocate() wont necessarily convert all the delayed
      allocation passed to it - it will return after allocating the first extent
      - so if the delayed allocation extends beyond eof then it will stay that
      way.
      
      SGI-PV: 983683
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31929a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      364f358a
  10. 13 8月, 2008 1 次提交
  11. 05 8月, 2008 2 次提交
  12. 28 7月, 2008 1 次提交
  13. 18 4月, 2008 2 次提交
  14. 07 2月, 2008 6 次提交
  15. 17 10月, 2007 2 次提交
    • F
      writeback: remove pages_skipped accounting in __block_write_full_page() · 1f7decf6
      Fengguang Wu 提交于
      Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:
      
      > The following strange behavior can be observed:
      >
      > 1. large file is written
      > 2. after 30 seconds, nr_dirty goes down by 1024
      > 3. then for some time (< 30 sec) nothing happens (disk idle)
      > 4. then nr_dirty again goes down by 1024
      > 5. repeat from 3. until whole file is written
      >
      > So basically a 4Mbyte chunk of the file is written every 30 seconds.
      > I'm quite sure this is not the intended behavior.
      
      It can be produced by the following test scheme:
      
      # cat bin/test-writeback.sh
      grep nr_dirty /proc/vmstat
      echo 1 > /proc/sys/fs/inode_debug
      dd if=/dev/zero of=/var/x bs=1K count=204800&
      while true; do grep nr_dirty /proc/vmstat; sleep 1; done
      
      # bin/test-writeback.sh
      nr_dirty 19207
      nr_dirty 19207
      nr_dirty 30924
      204800+0 records in
      204800+0 records out
      209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
      nr_dirty 47150
      nr_dirty 47141
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47205
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47215
      nr_dirty 47216
      nr_dirty 47216
      nr_dirty 47216
      nr_dirty 47154
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47134
      nr_dirty 47134
      nr_dirty 47135
      nr_dirty 47135
      nr_dirty 47135
      nr_dirty 46097 <== -1038
      nr_dirty 46098
      nr_dirty 46098
      nr_dirty 46098
      [...]
      nr_dirty 46091
      nr_dirty 46092
      nr_dirty 46092
      nr_dirty 45069 <== -1023
      nr_dirty 45056
      nr_dirty 45056
      nr_dirty 45056
      [...]
      nr_dirty 37822
      nr_dirty 36799 <== -1023
      [...]
      nr_dirty 36781
      nr_dirty 35758 <== -1023
      [...]
      nr_dirty 34708
      nr_dirty 33672 <== -1024
      [...]
      nr_dirty 33692
      nr_dirty 32669 <== -1023
      
      % ls -li /var/x
      847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x
      
      % dmesg|grep 847824  # generated by a debug printk
      [  529.263184] redirtied inode 847824 line 548
      [  564.250872] redirtied inode 847824 line 548
      [  594.272797] redirtied inode 847824 line 548
      [  629.231330] redirtied inode 847824 line 548
      [  659.224674] redirtied inode 847824 line 548
      [  689.219890] redirtied inode 847824 line 548
      [  724.226655] redirtied inode 847824 line 548
      [  759.198568] redirtied inode 847824 line 548
      
      # line 548 in fs/fs-writeback.c:
      543                 if (wbc->pages_skipped != pages_skipped) {
      544                         /*
      545                          * writeback is not making progress due to locked
      546                          * buffers.  Skip this inode for now.
      547                          */
      548                         redirty_tail(inode);
      549                 }
      
      More debug efforts show that __block_write_full_page()
      never has the chance to call submit_bh() for that big dirty file:
      the buffer head is *clean*. So basicly no page io is issued by
      __block_write_full_page(), hence pages_skipped goes up.
      
      Also the comment in generic_sync_sb_inodes():
      
      544                         /*
      545                          * writeback is not making progress due to locked
      546                          * buffers.  Skip this inode for now.
      547                          */
      
      and the comment in __block_write_full_page():
      
      1713                 /*
      1714                  * The page was marked dirty, but the buffers were
      1715                  * clean.  Someone wrote them back by hand with
      1716                  * ll_rw_block/submit_bh.  A rare case.
      1717                  */
      
      do not quite agree with each other. The page writeback should be skipped for
      'locked buffer', but here it is 'clean buffer'!
      
      This patch fixes this bug. Though I'm not sure why __block_write_full_page()
      is called only to do nothing and who actually issued the writeback for us.
      
      This is the two possible new behaviors after the patch:
      
      1) pretty nice: wait 30s and write ALL:)
      2) not so good:
      	- during the dd: ~16M
      	- after 30s:      ~4M
      	- after 5s:       ~4M
      	- after 5s:     ~176M
      
      The next patch will fix case (2).
      
      Cc: David Chinner <dgc@sgi.com>
      Cc: Ken Chen <kenchen@google.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f7decf6
    • N
      xfs: convert to new aops · d79689c7
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Timothy Shimmin <tes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d79689c7
  16. 16 10月, 2007 5 次提交
  17. 12 10月, 2007 1 次提交
  18. 10 10月, 2007 1 次提交
  19. 18 9月, 2007 1 次提交
  20. 05 9月, 2007 1 次提交
  21. 14 7月, 2007 3 次提交
  22. 29 5月, 2007 1 次提交