1. 03 12月, 2010 1 次提交
    • D
      xfs: convert l_last_sync_lsn to an atomic variable · 84f3c683
      Dave Chinner 提交于
      log->l_last_sync_lsn is updated in only one critical spot - log
      buffer Io completion - and is protected by the grant lock here. This
      requires the grant lock to be taken for every log buffer IO
      completion. Converting the l_last_sync_lsn variable to an atomic64_t
      means that we do not need to take the grant lock in log buffer IO
      completion to update it.
      
      This also removes the need for explicitly holding a spinlock to read
      the l_last_sync_lsn on 32 bit platforms.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      84f3c683
  2. 21 12月, 2010 6 次提交
  3. 19 10月, 2010 2 次提交
  4. 10 9月, 2010 1 次提交
  5. 24 8月, 2010 1 次提交
    • D
      xfs: Reduce log force overhead for delayed logging · a44f13ed
      Dave Chinner 提交于
      Delayed logging adds some serialisation to the log force process to
      ensure that it does not deference a bad commit context structure
      when determining if a CIL push is necessary or not. It does this by
      grabing the CIL context lock exclusively, then dropping it before
      pushing the CIL if necessary. This causes serialisation of all log
      forces and pushes regardless of whether a force is necessary or not.
      As a result fsync heavy workloads (like dbench) can be significantly
      slower with delayed logging than without.
      
      To avoid this penalty, copy the current sequence from the context to
      the CIL structure when they are swapped. This allows us to do
      unlocked checks on the current sequence without having to worry
      about dereferencing context structures that may have already been
      freed. Hence we can remove the CIL context locking in the forcing
      code and only call into the push code if the current context matches
      the sequence we need to force.
      
      By passing the sequence into the push code, we can check the
      sequence again once we have the CIL lock held exclusive and abort if
      the sequence has already been pushed. This avoids a lock round-trip
      and unnecessary CIL pushes when we have racing push calls.
      
      The result is that the regression in dbench performance goes away -
      this change improves dbench performance on a ramdisk from ~2100MB/s
      to ~2500MB/s. This compares favourably to not using delayed logging
      which retuns ~2500MB/s for the same workload.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a44f13ed
  6. 27 7月, 2010 6 次提交
  7. 24 5月, 2010 6 次提交
    • D
      xfs: forced unmounts need to push the CIL · 9da1ab18
      Dave Chinner 提交于
      If the filesystem is being shut down and the there is no log error,
      the current code forces out the current log buffers. This code now needs
      to push the CIL before it forces out the log buffers to acheive the same
      result.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      9da1ab18
    • D
      xfs: Introduce delayed logging core code · 71e330b5
      Dave Chinner 提交于
      The delayed logging code only changes in-memory structures and as
      such can be enabled and disabled with a mount option. Add the mount
      option and emit a warning that this is an experimental feature that
      should not be used in production yet.
      
      We also need infrastructure to track committed items that have not
      yet been written to the log. This is what the Committed Item List
      (CIL) is for.
      
      The log item also needs to be extended to track the current log
      vector, the associated memory buffer and it's location in the Commit
      Item List. Extend the log item and log vector structures to enable
      this tracking.
      
      To maintain the current log format for transactions with delayed
      logging, we need to introduce a checkpoint transaction and a context
      for tracking each checkpoint from initiation to transaction
      completion.  This includes adding a log ticket for tracking space
      log required/used by the context checkpoint.
      
      To track all the changes we need an io vector array per log item,
      rather than a single array for the entire transaction. Using the new
      log vector structure for this requires two passes - the first to
      allocate the log vector structures and chain them together, and the
      second to fill them out.  This log vector chain can then be passed
      to the CIL for formatting, pinning and insertion into the CIL.
      
      Formatting of the log vector chain is relatively simple - it's just
      a loop over the iovecs on each log vector, but it is made slightly
      more complex because we re-write the iovec after the copy to point
      back at the memory buffer we just copied into.
      
      This code also needs to pin log items. If the log item is not
      already tracked in this checkpoint context, then it needs to be
      pinned. Otherwise it is already pinned and we don't need to pin it
      again.
      
      The only other complexity is calculating the amount of new log space
      the formatting has consumed. This needs to be accounted to the
      transaction in progress, and the accounting is made more complex
      becase we need also to steal space from it for log metadata in the
      checkpoint transaction. Calculate all this at insert time and update
      all the tickets, counters, etc correctly.
      
      Once we've formatted all the log items in the transaction, attach
      the busy extents to the checkpoint context so the busy extents live
      until checkpoint completion and can be processed at that point in
      time. Transactions can then be freed at this point in time.
      
      Now we need to issue checkpoints - we are tracking the amount of log space
      used by the items in the CIL, so we can trigger background checkpoints when the
      space usage gets to a certain threshold. Otherwise, checkpoints need ot be
      triggered when a log synchronisation point is reached - a log force event.
      
      Because the log write code already handles chained log vectors, writing the
      transaction is trivial, too. Construct a transaction header, add it
      to the head of the chain and write it into the log, then issue a
      commit record write. Then we can release the checkpoint log ticket
      and attach the context to the log buffer so it can be called during
      Io completion to complete the checkpoint.
      
      We also need to allow for synchronising multiple in-flight
      checkpoints. This is needed for two things - the first is to ensure
      that checkpoint commit records appear in the log in the correct
      sequence order (so they are replayed in the correct order). The
      second is so that xfs_log_force_lsn() operates correctly and only
      flushes and/or waits for the specific sequence it was provided with.
      
      To do this we need a wait variable and a list tracking the
      checkpoint commits in progress. We can walk this list and wait for
      the checkpoints to change state or complete easily, an this provides
      the necessary synchronisation for correct operation in both cases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      71e330b5
    • D
      xfs: make the log ticket ID available outside the log infrastructure · 955833cf
      Dave Chinner 提交于
      The ticket ID is needed to uniquely identify transactions when doing busy
      extent matching. Delayed logging changes the lifecycle of busy extents with
      respect to the transaction structure lifecycle. Hence we can no longer use
      the transaction structure as a means of determining the owner of the busy
      extent as it may be freed and reused while the busy extent is still active.
      
      This commit provides the infrastructure to access the xlog_tid_t held in the
      ticket from a transaction handle. This avoids the need for callers to peek
      into the transaction and log structures to find this out.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      955833cf
    • D
      xfs: clean up log ticket overrun debug output · 169a7b07
      Dave Chinner 提交于
      Push the error message output when a ticket overrun is detected
      into the ticket printing functions. Also remove the debug version
      of the code as the production version will still panic just as
      effectively on a debug kernel via the panic mask being set.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      169a7b07
    • D
      xfs: allow log ticket allocation to take allocation flags · 3383ca57
      Dave Chinner 提交于
      Delayed logging currently requires ticket allocation to succeed, so
      we need to be able to sleep on allocation. It also should not allow
      memory allocation to recurse into the filesystem. hence we need to
      pass allocation flags directing the type of allocation the caller
      requires.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      3383ca57
    • D
      xfs: Don't reuse the same transaction ID for duplicated transactions. · 524ee36f
      Dave Chinner 提交于
      The transaction ID is written into the log as the unique identifier
      for transactions during recover. When duplicating a transaction, we
      reuse the log ticket, which means it has the same transaction ID as
      the previous transaction.
      
      Rather than regenerating a random transaction ID for the duplicated
      transaction, just add one to the current ID so that duplicated
      transaction can be easily spotted in the log and during recovery
      during problem diagnosis.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      524ee36f
  8. 19 5月, 2010 9 次提交
    • A
      xfs: kill off l_sectbb_mask · 48389ef1
      Alex Elder 提交于
      There remains only one user of the l_sectbb_mask field in the log
      structure.  Just kill it off and compute the mask where needed from
      the power-of-2 sector size.
      
      (Only update from last post is to accomodate the changes in the
      previous patch in the series.)
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      48389ef1
    • A
      xfs: record log sector size rather than log2(that) · 69ce58f0
      Alex Elder 提交于
      Change struct log so it keeps track of the size (in basic blocks) of
      a log sector in l_sectBBsize rather than the log-base-2 of that
      value (previously, l_sectbb_log).  The name was chosen for
      consistency with the other fields in the structure that represent
      a number of basic blocks.
      
      (Updated so that a variable used in computing and verifying a log's
      sector size is named "log2_size".  Also added the "BB" to the
      structure field name, based on feedback from Eric Sandeen.  Also
      dropped some superfluous parentheses.)
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      69ce58f0
    • D
      xfs: make the log ticket transaction id random · f9837107
      Dave Chinner 提交于
      The transaction ID that is written to the log for a transaction is
      currently set by taking the lower 32 bits of the memory address of
      the ticket structure.  This is not guaranteed to be unique as
      tickets comes from a slab and slots can be reallocated immediately
      after being freed. As a result, there is no guarantee of uniqueness
      in the ticket ID value.
      
      Fix this by assigning a random number to the ticket ID field so that
      it is extremely unlikely that duplicates will occur and remove the
      possibility of transactions being mixed up during recovery due to
      duplicate IDs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f9837107
    • C
      xfs: clean up xlog_write_adv_cnt · e6b1f273
      Christoph Hellwig 提交于
      Replace the awkward xlog_write_adv_cnt with an inline helper that makes
      it more obvious that it's modifying it's paramters, and replace the use
      of an integer type for "ptr" with a real void pointer.  Also move
      xlog_write_adv_cnt to xfs_log_priv.h as it will be used outside of
      xfs_log.c in the delayed logging series.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e6b1f273
    • D
      xfs: introduce new internal log vector structure · 55b66332
      Dave Chinner 提交于
      The current log IO vector structure is a flat array and not
      extensible. To make it possible to keep separate log IO vectors for
      individual log items, we need a method of chaining log IO vectors
      together.
      
      Introduce a new log vector type that can be used to wrap the
      existing log IO vectors on use that internally to the log. This
      means that the existing external interface (xfs_log_write) does not
      change and hence no changes to the transaction commit code are
      required.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      55b66332
    • C
      xfs: reindent xlog_write · 99428ad0
      Christoph Hellwig 提交于
      Reindent xlog_write to normal one tab indents and move all variable
      declarations into the closest enclosing block.
      
      Split from a bigger patch by Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      99428ad0
    • D
      xfs: factor xlog_write · b5203cd0
      Dave Chinner 提交于
      xlog_write is a mess that takes a lot of effort to understand. It is
      a mass of nested loops with 4 space indents to get it to fit in 80 columns
      and lots of funky variables that aren't obvious what they mean or do.
      
      Break it down into understandable chunks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      b5203cd0
    • D
      xfs: log ticket reservation underestimates the number of iclogs · 9b9fc2b7
      Dave Chinner 提交于
      When allocation a ticket for a transaction, the ticket is initialised with the
      worst case log space usage based on the number of bytes the transaction may
      consume. Part of this calculation is the number of log headers required for the
      iclog space used up by the transaction.
      
      This calculation makes an undocumented assumption that if the transaction uses
      the log header space reservation on an iclog, then it consumes either the
      entire iclog or it completes. That is - the transaction that is first in an
      iclog is the transaction that the log header reservation is accounted to. If
      the transaction is larger than the iclog, then it will use the entire iclog
      itself. Document this assumption.
      
      Further, the current calculation uses the rule that we can fit iclog_size bytes
      of transaction data into an iclog. This is in correct - the amount of space
      available in an iclog for transaction data is the size of the iclog minus the
      space used for log record headers. This means that the calculation is out by
      512 bytes per 32k of log space the transaction can consume. This is rarely an
      issue because maximally sized transactions are extremely uncommon, and for 4k
      block size filesystems maximal transaction reservations are about 400kb. Hence
      the error in this case is less than the size of an iclog, so that makes it even
      harder to hit.
      
      However, anyone using larger directory blocks (16k directory blocks push the
      maximum transaction size to approx. 900k on a 4k block size filesystem) or
      larger block size (e.g. 64k blocks push transactions to the 3-4MB size) could
      see the error grow to more than an iclog and at this point the transaction is
      guaranteed to get a reservation underrun and shutdown the filesystem.
      
      Fix this by adjusting the calculation to calculate the correct number of iclogs
      required and account for them all up front.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9b9fc2b7
    • D
      xfs: factor log item initialisation · 43f5efc5
      Dave Chinner 提交于
      Each log item type does manual initialisation of the log item.
      Delayed logging introduces new fields that need initialisation, so
      factor all the open coded initialisation into a common function
      first.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      43f5efc5
  9. 17 4月, 2010 1 次提交
    • D
      xfs: ensure that sync updates the log tail correctly · b6f8dd49
      Dave Chinner 提交于
      Updates to the VFS layer removed an extra ->sync_fs call into the
      filesystem during the sync process (from the quota code).
      Unfortunately the sync code was unknowingly relying on this call to
      make sure metadata buffers were flushed via a xfs_buftarg_flush()
      call to move the tail of the log forward in memory before the final
      transactions of the sync process were issued.
      
      As a result, the old code would write a very recent log tail value
      to the log by the end of the sync process, and so a subsequent crash
      would leave nothing for log recovery to do. Hence in qa test 182,
      log recovery only replayed a small handle for inode fsync
      transactions in this case.
      
      However, with the removal of the extra ->sync_fs call, the log tail
      was now not moved forward with the inode fsync transactions near the
      end of the sync procese the first (and only) buftarg flush occurred
      after these transactions went to disk. The result is that log
      recovery now sees a large number of transactions for metadata that
      is already on disk.
      
      This usually isn't a problem, but when the transactions include
      inode chunk allocation, the inode create transactions and all
      subsequent changes are replayed as we cannt rely on what is on disk
      is valid. As a result, if the inode was written and contains
      unlogged changes, the unlogged changes are lost, thereby violating
      sync semantics.
      
      The fix is to always issue a transaction after the buftarg flush
      occurs is the log iѕ not idle or covered. This results in a dummy
      transaction being written that contains the up-to-date log tail
      value, which will be very recent. Indeed, it will be at least as
      recent as the old code would have left on disk, so log recovery
      will behave exactly as it used to in this situation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      b6f8dd49
  10. 02 3月, 2010 1 次提交
  11. 22 1月, 2010 2 次提交
  12. 16 1月, 2010 1 次提交
  13. 17 12月, 2009 1 次提交
  14. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  15. 12 8月, 2009 1 次提交