1. 12 12月, 2009 1 次提交
  2. 02 9月, 2009 1 次提交
    • C
      xfs: merge fsync and O_SYNC handling · 13e6d5cd
      Christoph Hellwig 提交于
      The guarantees for O_SYNC are exactly the same as the ones we need to
      make for an fsync call (and given that Linux O_SYNC is O_DSYNC the
      equivalent is fdadatasync, but we treat both the same in XFS), except
      with a range data writeout.  Jan Kara has started unifying these two
      path for filesystems using the generic helpers, and I've started to
      look at XFS.
      
      The actual transaction commited by xfs_fsync and xfs_write_sync_logforce
      has a different transaction number, but actually is exactly the same.
      We'll only use the fsync transaction going forward.  One major difference
      is that xfs_write_sync_logforce never issues a cache flush unless we
      commit a transaction causing that as a side-effect, which is an obvious
      bug in the O_SYNC handling.  Second all the locking and i_update_size
      vs i_update_core changes from 978b7237
      never made it to xfs_write_sync_logforce, so we add them back.
      
      To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait
      call is moved up to xfs_file_fsync, so that we don't wait on the whole
      file after we already waited for our portion in xfs_write.
      
      We'll also use a plain call to filemap_write_and_wait_range instead
      of the previous sync_page_rang which did it in two steps including
      an half-hearted inode write out that doesn't help us.
      
      Once we're done with this also remove the now useless i_update_size
      tracking.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      13e6d5cd
  3. 29 3月, 2009 1 次提交
  4. 09 2月, 2009 1 次提交
  5. 30 10月, 2008 4 次提交
  6. 13 8月, 2008 2 次提交
  7. 18 4月, 2008 2 次提交
  8. 07 2月, 2008 2 次提交
    • D
      [XFS] Move AIL pushing into it's own thread · 249a8c11
      David Chinner 提交于
      When many hundreds to thousands of threads all try to do simultaneous
      transactions and the log is in a tail-pushing situation (i.e. full), we
      can get multiple threads walking the AIL list and contending on the AIL
      lock.
      
      The AIL push is, in effect, a simple I/O dispatch algorithm complicated by
      the ordering constraints placed on it by the transaction subsystem. It
      really does not need multiple threads to push on it - even when only a
      single CPU is pushing the AIL, it can push the I/O out far faster that
      pretty much any disk subsystem can handle.
      
      So, to avoid contention problems stemming from multiple list walkers, move
      the list walk off into another thread and simply provide a "target" to
      push to. When a thread requires a push, it sets the target and wakes the
      push thread, then goes to sleep waiting for the required amount of space
      to become available in the log.
      
      This mechanism should also be a lot fairer under heavy load as the waiters
      will queue in arrival order, rather than queuing in "who completed a push
      first" order.
      
      Also, by moving the pushing to a separate thread we can do more
      effectively overload detection and prevention as we can keep context from
      loop iteration to loop iteration. That is, we can push only part of the
      list each loop and not have to loop back to the start of the list every
      time we run. This should also help by reducing the number of items we try
      to lock and/or push items that we cannot move.
      
      Note that this patch is not intended to solve the inefficiencies in the
      AIL structure and the associated issues with extremely large list
      contents. That needs to be addresses separately; parallel access would
      cause problems to any new structure as well, so I'm only aiming to isolate
      the structure from unbounded parallelism here.
      
      SGI-PV: 972759
      SGI-Modid: xfs-linux-melb:xfs-kern:30371a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      249a8c11
    • D
      [XFS] Fix up sparse warnings. · a8272ce0
      David Chinner 提交于
      These are mostly locking annotations, marking things static, casts where
      needed and declaring stuff in header files.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30002a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      a8272ce0
  9. 14 7月, 2007 1 次提交
    • D
      [XFS] Lazy Superblock Counters · 92821e2b
      David Chinner 提交于
      When we have a couple of hundred transactions on the fly at once, they all
      typically modify the on disk superblock in some way.
      create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
      free block counts.
      
      When these counts are modified in a transaction, they must eventually lock
      the superblock buffer and apply the mods. The buffer then remains locked
      until the transaction is committed into the incore log buffer. The result
      of this is that with enough transactions on the fly the incore superblock
      buffer becomes a bottleneck.
      
      The result of contention on the incore superblock buffer is that
      transaction rates fall - the more pressure that is put on the superblock
      buffer, the slower things go.
      
      The key to removing the contention is to not require the superblock fields
      in question to be locked. We do that by not marking the superblock dirty
      in the transaction. IOWs, we modify the incore superblock but do not
      modify the cached superblock buffer. In short, we do not log superblock
      modifications to critical fields in the superblock on every transaction.
      In fact we only do it just before we write the superblock to disk every
      sync period or just before unmount.
      
      This creates an interesting problem - if we don't log or write out the
      fields in every transaction, then how do the values get recovered after a
      crash? the answer is simple - we keep enough duplicate, logged information
      in other structures that we can reconstruct the correct count after log
      recovery has been performed.
      
      It is the AGF and AGI structures that contain the duplicate information;
      after recovery, we walk every AGI and AGF and sum their individual
      counters to get the correct value, and we do a transaction into the log to
      correct them. An optimisation of this is that if we have a clean unmount
      record, we know the value in the superblock is correct, so we can avoid
      the summation walk under normal conditions and so mount/recovery times do
      not change under normal operation.
      
      One wrinkle that was discovered during development was that the blocks
      used in the freespace btrees are never accounted for in the AGF counters.
      This was once a valid optimisation to make; when the filesystem is full,
      the free space btrees are empty and consume no space. Hence when it
      matters, the "accounting" is correct. But that means the when we do the
      AGF summations, we would not have a correct count and xfs_check would
      complain. Hence a new counter was added to track the number of blocks used
      by the free space btrees. This is an *on-disk format change*.
      
      As a result of this, lazy superblock counters are a mkfs option and at the
      moment on linux there is no way to convert an old filesystem. This is
      possible - xfs_db can be used to twiddle the right bits and then
      xfs_repair will do the format conversion for you. Similarly, you can
      convert backwards as well. At some point we'll add functionality to
      xfs_admin to do the bit twiddling easily....
      
      SGI-PV: 964999
      SGI-Modid: xfs-linux-melb:xfs-kern:28652a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      92821e2b
  10. 08 5月, 2007 1 次提交
  11. 10 2月, 2007 2 次提交
  12. 28 9月, 2006 1 次提交
  13. 27 6月, 2006 1 次提交
  14. 20 6月, 2006 1 次提交
  15. 29 3月, 2006 1 次提交
  16. 14 3月, 2006 1 次提交
  17. 11 1月, 2006 1 次提交
  18. 02 11月, 2005 4 次提交
  19. 05 9月, 2005 1 次提交
  20. 02 9月, 2005 1 次提交
  21. 21 6月, 2005 1 次提交
  22. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4