1. 14 10月, 2008 40 次提交
    • M
      ocfs2: Remove pointless !! · 009d3750
      Mark Fasheh 提交于
      ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
      one value.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      009d3750
    • T
      ocfs2: Add empty bucket support in xattr. · 5a095611
      Tao Ma 提交于
      As Mark mentioned, it may be time-consuming when we remove the
      empty xattr bucket, so this patch try to let empty bucket exist
      in xattr operation. The modification includes:
      1. Remove the functin of bucket and extent record deletion during
         xattr delete.
      2. In xattr set:
         1) Don't clean the last entry so that if the bucket is empty,
            the hash value of the bucket is the hash value of the entry
            which is deleted last.
         2) During insert, if we meet with an empty bucket, just use the
            1st entry.
      3. In binary search of xattr bucket, use the bucket hash value(which
         stored in the 1st xattr entry) to find the right place.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5a095611
    • T
      ocfs2/xattr.c: Fix a bug when inserting xattr. · 06b240d8
      Tao Ma 提交于
      During the process of xatt insertion, we use binary search
      to find the right place and "low" is set to it. But when
      there is one xattr which has the same name hash as the inserted
      one, low is the wrong value. So set it to the right position.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      06b240d8
    • S
      ocfs2: Add xattr mount option in ocfs2_show_options() · b0f73cfc
      Sunil Mushran 提交于
      Patch adds check for [no]user_xattr in ocfs2_show_options() that completes
      the list of all mount options.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b0f73cfc
    • J
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker 提交于
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2b4e30fb
    • J
      ocfs2: Add the 'inode64' mount option. · 12462f1d
      Joel Becker 提交于
      Now that ocfs2 limits inode numbers to 32bits, add a mount option to
      disable the limit.  This parallels XFS.  64bit systems can handle the
      larger inode numbers.
      
      [ Added description of inode64 mount option in ocfs2.txt. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      12462f1d
    • J
      ocfs2: Limit inode allocation to 32bits. · 1187c968
      Joel Becker 提交于
      ocfs2 inode numbers are block numbers.  For any filesystem with less
      than 2^32 blocks, this is not a problem.  However, when ocfs2 starts
      using JDB2, it will be able to support filesystems with more than 2^32
      blocks.  This would result in inode numbers higher than 2^32.
      
      The problem is that stat(2) can't handle those numbers on 32bit
      machines.  The simple solution is to have ocfs2 allocate all inodes
      below that boundary.
      
      The suballoc code is changed to honor an optional block limit.  Only the
      inode suballocator sets that limit - all other allocations stay unlimited.
      
      The biggest trick is to grow the inode suballocator beneath that limit.
      There's no point in allocating block groups that are above the limit,
      then rejecting their elements later on.  We want to prevent the inode
      allocator from ever having block groups above the limit.  This involves
      a little gyration with the local alloc code.  If the local alloc window
      is above the limit, it signals the caller to try the global bitmap but
      does not disable the local alloc file (which can be used for other
      allocations).
      
      [ Minor cleanup - removed an ML_NOTICE comment. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1187c968
    • T
      ocfs2: Resolve deadlock in ocfs2_xattr_free_block. · 08413899
      Tao Ma 提交于
      In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we
      have a transaction open. This will deadlock the downconvert thread, so fix
      it.
      
      We can clean up how xattr blocks are removed while here - this patch also
      moves the mechanism of releasing xattr block (including both value, xattr
      tree and xattr block) into this function.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      08413899
    • T
      ocfs2: bug-fix for journal extend in xattr. · 28b8ca0b
      Tao Ma 提交于
      In ocfs2_extend_trans, when we can't extend the current
      transaction, it will commit current transaction and restart
      a new one. So if the previous credits we have allocated aren't
      used(the block isn't dirtied before our extend), we will not
      have enough credits for any future operation(it will cause jbd
      complain and bug out). So check this and re-extend it.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      28b8ca0b
    • J
      ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree() · 8d6220d6
      Joel Becker 提交于
      The original get/put_extent_tree() functions held a reference on
      et_root_bh.  However, every single caller already has a safe reference,
      making the get/put cycle irrelevant.
      
      We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree().  It
      no longer gets a reference on et_root_bh.  ocfs2_put_extent_tree() is
      removed.  Callers now have a simpler init+use pattern.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8d6220d6
    • J
      ocfs2: Comment struct ocfs2_extent_tree_operations. · 1625f8ac
      Joel Becker 提交于
      struct ocfs2_extent_tree_operations provides methods for the different
      on-disk btrees in ocfs2.  Describing what those methods do is probably a
      good idea.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1625f8ac
    • J
      ocfs2: Make ocfs2_extent_tree the first-class representation of a tree. · f99b9b7c
      Joel Becker 提交于
      We now have three different kinds of extent trees in ocfs2: inode data
      (dinode), extended attributes (xattr_tree), and extended attribute
      values (xattr_value).  There is a nice abstraction for them,
      ocfs2_extent_tree, but it is hidden in alloc.c.  All the calling
      functions have to pick amongst a varied API and pass in type bits and
      often extraneous pointers.
      
      A better way is to make ocfs2_extent_tree a first-class object.
      Everyone converts their object to an ocfs2_extent_tree() via the
      ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
      tree calls to alloc.c.
      
      This simplifies a lot of callers, making for readability.  It also
      provides an easy way to add additional extent tree types, as they only
      need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
      function.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f99b9b7c
    • J
      ocfs2: Add an insertion check to ocfs2_extent_tree_operations. · 1e61ee79
      Joel Becker 提交于
      A couple places check an extent_tree for a valid inode.  We move that
      out to add an eo_insert_check() operation.  It can be called from
      ocfs2_insert_extent() and elsewhere.
      
      We also have the wrapper calls ocfs2_et_insert_check() and
      ocfs2_et_sanity_check() ignore NULL ops.  That way we don't have to
      provide useless operations for xattr types.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1e61ee79
    • J
      ocfs2: Create specific get_extent_tree functions. · 1a09f556
      Joel Becker 提交于
      A caller knows what kind of extent tree they have.  There's no reason
      they have to call ocfs2_get_extent_tree() with a NULL when they could
      just as easily call a specific function to their type of extent tree.
      
      Introduce ocfs2_dinode_get_extent_tree(),
      ocfs2_xattr_tree_get_extent_tree(), and
      ocfs2_xattr_value_get_extent_tree().  They only take the necessary
      arguments, calling into the underlying __ocfs2_get_extent_tree() to do
      the real work.
      
      __ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
      without needing any switch-by-type logic.
      
      ocfs2_get_extent_tree() is now a wrapper around the specific calls.  It
      exists because a couple alloc.c functions can take et_type.  This will
      go later.
      
      Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
      struct ocfs2_xattr_value_root* instead of void*.  This gives us
      typechecking where we didn't have it before.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1a09f556
    • J
      ocfs2: Determine an extent tree's max_leaf_clusters in an et_op. · 943cced3
      Joel Becker 提交于
      Provide an optional extent_tree_operation to specify the
      max_leaf_clusters of an ocfs2_extent_tree.  If not provided, the value
      is 0 (unlimited).
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      943cced3
    • J
      ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents(). · 1c25d93a
      Joel Becker 提交于
      ocfs2_num_free_extents() re-implements the logic of
      ocfs2_get_extent_tree().  Now that ocfs2_get_extent_tree() does not
      allocate, let's use it in ocfs2_num_free_extents() to simplify the code.
      
      The inode validation code in ocfs2_num_free_extents() is not needed.
      All callers are passing in pre-validated inodes.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1c25d93a
    • J
      ocfs2: Provide the get_root_el() method to ocfs2_extent_tree_operations. · 0ce1010f
      Joel Becker 提交于
      The root_el of an ocfs2_extent_tree needs to be calculated from
      et->et_object.  Make it an operation on et->et_ops.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0ce1010f
    • J
      ocfs2: Make 'private' into 'object' on ocfs2_extent_tree. · ea5efa15
      Joel Becker 提交于
      The 'private' pointer was a way to store off xattr values, which don't
      live at a set place in the bh.  But the concept of "the object
      containing the extent tree" is much more generic.  For an inode it's the
      struct ocfs2_dinode, for an xattr value its the value.  Let's save off
      the 'object' at all times.  If NULL is passed to
      ocfs2_get_extent_tree(), 'object' is set to bh->b_data;
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ea5efa15
    • J
      ocfs2: Make ocfs2_extent_tree get/put instead of alloc. · dc0ce61a
      Joel Becker 提交于
      Rather than allocating a struct ocfs2_extent_tree, just put it on the
      stack.  Fill it with ocfs2_get_extent_tree() and drop it with
      ocfs2_put_extent_tree().  Now the callers don't have to ENOMEM, yet
      still safely ref the root_bh.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      dc0ce61a
    • J
      ocfs2: Prefix the ocfs2_extent_tree structure. · ce1d9ea6
      Joel Becker 提交于
      The members of the ocfs2_extent_tree structure gain a prefix of 'et_'.
      All users are updated.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ce1d9ea6
    • J
      ocfs2: Prefix the extent tree operations structure. · 35dc0aa3
      Joel Becker 提交于
      The ocfs2_extent_tree_operations structure gains a field prefix on its
      members.  The ->eo_sanity_check() operation gains a wrapper function for
      completeness.  All of the extent tree operation wrappers gain a
      consistent name (ocfs2_et_*()).
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      35dc0aa3
    • M
      ocfs2: fix printk format warnings · ff1ec20e
      Mark Fasheh 提交于
      This patch fixes the following build warnings:
      
      fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket':
      fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
      fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
      fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
      fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
      fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
      fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
      fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket':
      fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
      fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
      fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ff1ec20e
    • T
      ocfs2: Add incompatible flag for extended attribute · 8154da3d
      Tiger Yang 提交于
      This patch adds the s_incompat flag for extended attribute support. This
      helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
      to mount a volume with xattr support.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8154da3d
    • T
      ocfs2: Delete all xattr buckets during inode removal · a3944256
      Tao Ma 提交于
      In inode removal, we need to iterate all the buckets, remove any
      externally-stored EA values and delete the xattr buckets.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a3944256
    • T
      ocfs2: Enable xattr set in index btree · 01225596
      Tao Ma 提交于
      Where the previous patches added the ability of list/get xattr in buckets
      for ocfs2, this patch enables ocfs2 to store large numbers of EAs.
      
      The original design doc is written by Mark Fasheh, and it can be found in
      http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to
      make small modifications to it.
      
      First, because the bucket size is 4K, a new field named xh_free_start is added
      in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket.
      It is used when we store new EA name/value. With this field, we can find the
      place more quickly and what's more, we don't need to sort the name/value every
      time to let the last entry indicate the next unused space. This makes the
      insert operation more efficient for blocksizes smaller than 4k.
      
      Because of the new xh_free_start, another field named as xh_name_value_len is
      also added in ocfs2_xattr_header. It records the total length of all the
      name/values in the bucket. We need this so that we can check it and defragment
      the bucket if there is not enough contiguous free space.
      
      An xattr insertion looks like this:
      1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA.
      2. check whether there is enough space in bucketA. If yes, insert it directly
         and modify xh_free_start and xh_name_value_len accordingly. If not, check
         xh_name_value_len to see whether we can store this by defragment the bucket.
         If yes, defragment it and go on insertion.
      3. If defragement doesn't work, check whether there is new empty bucket in
         the clusters within this extent record. If yes, init the new bucket and move
         all the buckets after bucketA one by one to the next bucket. Move half of the
         entries in bucketA to the next bucket and go on insertion.
      4. If there is no new bucket, grow the extent tree.
      
      As for xattr deletion, we will delete an xattr bucket when all it's xattrs
      are removed and move all the buckets after it to the previous one. When all
      the xattr buckets in an extend record are freed, free this extend records
      from ocfs2_xattr_tree.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      01225596
    • T
      ocfs2: Optionally limit extent size in ocfs2_insert_extent() · ca12b7c4
      Tao Ma 提交于
      In xattr bucket, we want to limit the maximum size of a btree leaf,
      otherwise we'll lose the benefits of hashing because we'll have to search
      large leaves.
      
      So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
      size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
      record even if it is contiguous with an adjacent record.
      
      Other btree types are not affected by this change.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ca12b7c4
    • T
      ocfs2: Add xattr lookup code xattr btrees · 589dc260
      Tao Ma 提交于
      Add code to lookup a given extended attribute in the xattr btree. Lookup
      follows this general scheme:
      
      1. Use ocfs2_xattr_get_rec to find the xattr extent record
      
      2. Find the xattr bucket within the extent which may contain this xattr
      
      3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need
         to recalcuate the block offset and name offset for the right position of
         name/value.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      589dc260
    • T
      ocfs2: Add xattr bucket iteration for large numbers of EAs · 0c044f0b
      Tao Ma 提交于
      Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets.
      Attributes are stored within a given bucket, depending on hash value.
      
      After a discussion with Mark, we decided that the per-bucket index
      (xe_entry[]) would only exist in the 1st block of a bucket. Likewise,
      name/value pairs will not straddle more than one block. This allows the
      majority of operations to work directly on the buffer heads in a leaf block.
      
      This patch adds code to iterate the buckets in an EA. A new abstration of
      ocfs2_xattr_bucket is added. It records the bhs in this bucket and
      ocfs2_xattr_header. This keeps the code neat, improving readibility.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0c044f0b
    • T
      ocfs2: Add xattr index tree operations · ba492615
      Tao Ma 提交于
      When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
      store large numbers of EAs. This patch adds a new type in
      ocfs2_extent_tree_type and adds the implementation so that we can re-use the
      b-tree code to handle the storage of many EAs.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ba492615
    • T
      ocfs2: Add extended attribute support · cf1d6c76
      Tiger Yang 提交于
      This patch implements storing extended attributes both in inode or a single
      external block. We only store EA's in-inode when blocksize > 512 or that
      inode block has free space for it. When an EA's value is larger than 80
      bytes, we will store the value via b-tree outside inode or block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf1d6c76
    • T
      ocfs2: reserve inline space for extended attribute · fdd77704
      Tiger Yang 提交于
      Add the structures and helper functions we want for handling inline extended
      attributes. We also update the inline-data handlers so that they properly
      function in the event that we have both inline data and inline attributes
      sharing an inode block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      fdd77704
    • T
      ocfs2: Add extent tree operation for xattr value btrees · f56654c4
      Tao Ma 提交于
      Add some thin wrappers around ocfs2_insert_extent() for each of the 3
      different btree types, ocfs2_inode_insert_extent(),
      ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
      last is for the xattr index btree, which will be used in a followup patch.
      
      All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
      while the other two handle the xattr issue. And the init of extent tree are
      handled by these functions.
      
      When storing xattr value which is too large, we will allocate some clusters
      for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
      order to re-use the b-tree operation code, a new parameter named "private"
      is added into ocfs2_extent_tree and it is used to indicate the root of
      ocfs2_exent_list. The reason is that we can't deduce the root from the
      buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
      in any place in an ocfs2_xattr_bucket.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f56654c4
    • T
      ocfs2: Add helper function in uptodate.c for removing xattr clusters · ac11c827
      Tao Ma 提交于
      The old uptodate only handles the issue of removing one buffer_head from
      ocfs2 inode's buffer cache. With xattr clusters, we may need to remove
      multiple buffer_head's at a time.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ac11c827
    • T
      ocfs2: Add the basic xattr disk layout in ocfs2_fs.h · 5a7bc8eb
      Tao Ma 提交于
      Ocfs2 uses a very flexible structure for storing extended attributes on
      disk. Small amount of attributes are stored directly in the inode block - up
      to 256 bytes worth. If that fills up, attributes are also stored in an
      external block, linked to from the inode block. That block can in turn
      expand to a btree, capable of storing large numbers of attributes.
      
      Individual attribute values are stored inline if they're small enough
      (currently about 80 bytes, this can be changed though), and otherwise are
      expanded to a btree. The theoretical limit to the size of an individual
      attribute is about the same as an inode, though the kernel's upper bound on
      the size of an attributes data is far smaller.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5a7bc8eb
    • T
      ocfs2: Make high level btree extend code generic · 0eb8d47e
      Tao Ma 提交于
      Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
      function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
      ocfs2_do_cluster_allocation() now, but the latter can be used for other
      btree types as well.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0eb8d47e
    • T
      ocfs2: Abstract ocfs2_extent_tree in b-tree operations. · e7d4cb6b
      Tao Ma 提交于
      In the old extent tree operation, we take the hypothesis that we
      are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
      As xattr will also use ocfs2_extent_list to store large value
      for a xattr entry, we refactor the tree operation so that xattr
      can use it directly.
      
      The refactoring includes 4 steps:
      1. Abstract set/get of last_eb_blk and update_clusters since they may
         be stored in different location for dinode and xattr.
      2. Add a new structure named ocfs2_extent_tree to indicate the
         extent tree the operation will work on.
      3. Remove all the use of fe_bh and di, use root_bh and root_el in
         extent tree instead. So now all the fe_bh is replaced with
         et->root_bh, el with root_el accordingly.
      4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
         in file extend allocation. But the whole function is useful when we want
         to store large EAs.
      
      Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
      for anything other than truncate inode data btrees.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e7d4cb6b
    • T
      ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. · 811f933d
      Tao Ma 提交于
      ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
      ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
      they are all limited to an inode btree because they use a struct
      ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
      (the part of an ocfs2_dinode they actually use) so that the xattr btree code
      can use these functions.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      811f933d
    • T
      ocfs2: Modify ocfs2_num_free_extents for future xattr usage. · 231b87d1
      Tao Ma 提交于
      ocfs2_num_free_extents() is used to find the number of free extent records
      in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
      use this for extended attribute trees in the future, so genericize the
      interface the take a buffer head. A future patch will allow that buffer_head
      to contain any structure rooting an ocfs2 btree.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      231b87d1
    • M
      ocfs2: track local alloc state via debugfs · 9a8ff578
      Mark Fasheh 提交于
      A per-mount debugfs file, "local_alloc" is created which when read will
      expose live state of the nodes local alloc file. Performance impact is
      minimal, only a bit of memory overhead per mount point. Still, the code is
      hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
      local alloc performance problems on a live system.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9a8ff578
    • M
      ocfs2: throttle back local alloc when low on disk space · 9c7af40b
      Mark Fasheh 提交于
      Ocfs2's local allocator disables itself for the duration of a mount point
      when it has trouble allocating a large enough area from the primary bitmap.
      That can cause performance problems, especially for disks which were only
      temporarily full or fragmented. This patch allows for the allocator to
      shrink it's window first, before being disabled. Later, it can also be
      re-enabled so that any performance drop is minimized.
      
      To do this, we allow the value of osb->local_alloc_bits to be shrunk when
      needed. The default value is recorded in a mostly read-only variable so that
      we can re-initialize when required.
      
      Locking had to be updated so that we could protect changes to
      local_alloc_bits. Mostly this involves protecting various local alloc values
      with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
      is used when the local allocator is has shrunk, but is not disabled. If the
      available space dips below 1 megabyte, the local alloc file is disabled. In
      either case, local alloc is re-enabled 30 seconds after the event, or when
      an appropriate amount of bits is seen in the primary bitmap.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9c7af40b