1. 22 8月, 2013 2 次提交
    • D
      xfs: Add write support for dirent filetype field · 1c55cece
      Dave Chinner 提交于
      Add support to propagate and add filetype values into the on-disk
      directs. This involves passing the filetype into the xfs_da_args
      structure along with the name and namelength for direct operations,
      and encoding it into the dirent at the same time we write the inode
      number into the dirent.
      
      With write support, add the feature flag to the
      XFS_SB_FEAT_INCOMPAT_ALL mask so we can now mount filesystems with
      this feature set.
      
      Performance of directory recursion is now much improved. Parallel
      walk of ~50 million directory entries across hundreds of directories
      improves significantly. Unpatched, no CRCs:
      
      Walking via ls -R
      
      real    3m19.886s
      user    6m36.960s
      sys     28m19.087s
      
      THis is doing roughly 500 getdents() calls per second, and 250,000
      inode lookups per second to determine the inode type at roughly
      17,000 read IOPS. CPU usage is 90% kernel space.
      
      With dtype support patched in and the fileset recreated with CRCs
      enabled:
      
      Walking via ls -R
      
      real    0m31.316s
      user    6m32.975s
      sys     0m21.111s
      
      This is doing roughly 3500 getdents() calls per second at 16,000
      IOPS. There are no inode lookups at all. CPU usages is almost 100%
      userspace.
      
      This is a big win for recursive directory walks that only need to
      find file names and file types.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1c55cece
    • D
      xfs: Add read-only support for dirent filetype field · 0cb97766
      Dave Chinner 提交于
      Add support for the file type field in directory entries so that
      readdir can return the type of the inode the dirent points to to
      userspace without first having to read the inode off disk.
      
      The encoding of the type field is a single byte that is added to the
      end of the directory entry name length. For all intents and
      purposes, it appends a "hidden" byte to the name field which
      contains the type information. As the directory entry is already of
      dynamic size, helpers are already required to access and decode the
      direct entry structures.
      
      Hence the relevent extraction and iteration helpers are updated to
      understand the hidden byte.  Helpers for reading and writing the
      filetype field from the directory entries are also added. Only the
      read helpers are used by this patch.  It also adds all the code
      necessary to read the type information out of the dirents on disk.
      
      Further we add the superblock feature bit and helpers to indicate
      that we understand the on-disk format change. This is not a
      compatible change - existing kernels cannot read the new format
      successfully - so an incompatible feature flag is added. We don't
      yet allow filesystems to mount with this flag yet - that will be
      added once write support is added.
      
      Finally, the code to take the type from the VFS, convert it to an
      XFS on-disk type and put it into the xfs_name structures passed
      around is added, but the directory code does not use this field yet.
      That will be in the next patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0cb97766
  2. 21 8月, 2013 26 次提交
  3. 16 8月, 2013 7 次提交
  4. 14 8月, 2013 5 次提交
    • D
      xfs: split the CIL lock · 4bb928cd
      Dave Chinner 提交于
      The xc_cil_lock is used for two purposes - to protect the CIL
      itself, and to protect the push/commit state and lists. These are
      two logically separate structures and operations, so can have their
      own locks. This means that pushing on the CIL and the commit wait
      ordering won't contend for a lock with other transactions that are
      completing concurrently. As the CIL insertion is the hottest path
      throught eh CIL, this is a big win.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bb928cd
    • D
      xfs: Combine CIL insert and prepare passes · 991aaf65
      Dave Chinner 提交于
      Now that all the log item preparation and formatting is done under
      the CIL lock, we can get rid of the intermediate log vector chain
      used to track items to be inserted into the CIL.
      
      We can already find all the items to be committed from the
      transaction handle, so as long as we attach the log vectors to the
      item before we insert the items into the CIL, we don't need to
      create a log vector chain to pass around.
      
      This means we can move all the item insertion code into and optimise
      it into a pair of simple passes across all the items in the
      transaction. The first pass does the formatting and accounting, the
      second inserts them all into the CIL.
      
      We keep this two pass split so that we can separate the CIL
      insertion - which must be done under the CIL spinlock - from the
      formatting. We could insert each item into the CIL with a single
      pass, but that massively increases the number of times we have to
      grab the CIL spinlock. It is much more efficient (and hence
      scalable) to do a batch operation and insert all objects in a single
      lock grab.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      991aaf65
    • D
      xfs: avoid CIL allocation during insert · f5baac35
      Dave Chinner 提交于
      Now that we have the size of the log vector that has been allocated,
      we can determine if we need to allocate a new log vector for
      formatting and insertion. We only need to allocate a new vector if
      it won't fit into the existing buffer.
      
      However, we need to hold the CIL context lock while we do this so
      that we can't race with a push draining the currently queued log
      vectors. It is safe to do this as long as we do GFP_NOFS allocation
      to avoid avoid memory allocation recursing into the filesystem.
      Hence we can safely overwrite the existing log vector on the CIL if
      it is large enough to hold all the dirty regions of the current
      item.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f5baac35
    • D
      xfs: Reduce allocations during CIL insertion · 7492c5b4
      Dave Chinner 提交于
      Now that we have the size of the object before the formatting pass
      is called, we can allocation the log vector and it's buffer in a
      single allocation rather than two separate allocations.
      
      Store the size of the allocated buffer in the log vector so that
      we potentially avoid allocation for future modifications of the
      object.
      
      While touching this code, remove the IOP_FORMAT definition.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7492c5b4
    • D
      xfs: return log item size in IOP_SIZE · 166d1368
      Dave Chinner 提交于
      To begin optimising the CIL commit process, we need to have IOP_SIZE
      return both the number of vectors and the size of the data pointed
      to by the vectors. This enables us to calculate the size ofthe
      memory allocation needed before the formatting step and reduces the
      number of memory allocations per item by one.
      
      While there, kill the IOP_SIZE macro.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      166d1368