1. 18 8月, 2009 1 次提交
  2. 24 7月, 2009 1 次提交
    • T
      ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. · 82e12644
      Tao Ma 提交于
      In ocfs2_adjust_adjacent_records, we will adjust adjacent records
      according to the extent_list in the lower level. But actually
      the lower level tree will either be a leaf or a branch. If we only
      use ocfs2_is_empty_extent we will meet with some problem if the lower
      tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead.
      And actually only the leaf record can have holes. So add a BUG_ON
      for non-leaf branch.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      82e12644
  3. 22 7月, 2009 1 次提交
  4. 16 6月, 2009 1 次提交
    • T
      ocfs2: Adjust rightmost path in ocfs2_add_branch. · 6b791bcc
      Tao Ma 提交于
      In ocfs2_add_branch, we use the rightmost rec of the leaf extent block
      to generate the e_cpos for the newly added branch. In the most case, it
      is OK but if the parent extent block's rightmost rec covers more clusters
      than the leaf does, it will cause kernel panic if we insert some clusters
      in it. The message is something like:
      (7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression:
      le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count)
      (7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28,
      next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1
       [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2]
       [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2]
       [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2]
      ...
      
      The panic can be easily reproduced by the following small test case
      (with bs=512, cs=4K, and I remove all the error handling so that it looks
      clear enough for reading).
      
      int main(int argc, char **argv)
      {
      	int fd, i;
      	char buf[5] = "test";
      
      	fd = open(argv[1], O_RDWR|O_CREAT);
      
      	for (i = 0; i < 30; i++) {
      		lseek(fd, 40960 * i, SEEK_SET);
      		write(fd, buf, 5);
      	}
      
      	ftruncate(fd, 1146880);
      
      	lseek(fd, 1126400, SEEK_SET);
      	write(fd, buf, 5);
      
      	close(fd);
      
      	return 0;
      }
      
      The reason of the panic is that:
      the 30 writes and the ftruncate makes the file's extent list looks like:
      
      	Tree Depth: 1   Count: 19   Next Free Rec: 1
      	## Offset        Clusters       Block#
      	0  0             280            86183
      	SubAlloc Bit: 7   SubAlloc Slot: 0
      	Blknum: 86183   Next Leaf: 0
      	CRC32: 00000000   ECC: 0000
      	Tree Depth: 0   Count: 28   Next Free Rec: 28
      	## Offset        Clusters       Block#          Flags
      	0  0             1              143368          0x0
      	1  10            1              143376          0x0
      	...
      	26 260           1              143576          0x0
      	27 270           1              143584          0x0
      
      Now another write at 1126400(275 cluster) whiich will write at the gap
      between 271 and 280 will trigger ocfs2_add_branch, but the result after
      the function looks like:
      	Tree Depth: 1   Count: 19   Next Free Rec: 2
      	## Offset        Clusters       Block#
      	0  0             280            86183
      	1  271           0             143592
      So the extent record is intersected and make the following operation bug out.
      
      This patch just try to remove the gap before we add the new branch, so that
      the root(branch) rightmost rec will cover the same right position. So in the
      above case, before adding branch the tree will be changed to
      	Tree Depth: 1   Count: 19   Next Free Rec: 1
      	## Offset        Clusters       Block#
      	0  0             271            86183
      	SubAlloc Bit: 7   SubAlloc Slot: 0
      	Blknum: 86183   Next Leaf: 0
      	CRC32: 00000000   ECC: 0000
      	Tree Depth: 0   Count: 28   Next Free Rec: 28
      	## Offset        Clusters       Block#          Flags
      	0  0             1              143368          0x0
      	1  10            1              143376          0x0
      	...
      	26 260           1              143576          0x0
      	27 270           1              143584          0x0
      And after branch add, the tree looks like
      	Tree Depth: 1   Count: 19   Next Free Rec: 2
      	## Offset        Clusters       Block#
      	0  0             271            86183
      	1  271           0             143592
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6b791bcc
  5. 04 4月, 2009 1 次提交
    • M
      ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef
      Mark Fasheh 提交于
      This patch makes use of Ocfs2's flexible btree code to add an additional
      tree to directory inodes. The new tree stores an array of small,
      fixed-length records in each leaf block. Each record stores a hash value,
      and pointer to a block in the traditional (unindexed) directory tree where a
      dirent with the given name hash resides. Lookup exclusively uses this tree
      to find dirents, thus providing us with constant time name lookups.
      
      Some of the hashing code was copied from ext3. Unfortunately, it has lots of
      unfixed checkpatch errors. I left that as-is so that tracking changes would
      be easier.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      9b7895ef
  6. 13 3月, 2009 1 次提交
  7. 27 2月, 2009 1 次提交
  8. 03 2月, 2009 1 次提交
  9. 09 1月, 2009 1 次提交
  10. 06 1月, 2009 13 次提交
    • T
      ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. · 9047beab
      Tao Ma 提交于
      In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*()
      functions", the wrong buffer_head is accessed. So change it
      to the right buffer_head.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9047beab
    • J
      ocfs2: Create ocfs2_xattr_value_buf. · 2a50a743
      Joel Becker 提交于
      When an ocfs2 extended attribute is large enough to require its own
      allocation tree, we root it with an ocfs2_xattr_value_root.  However,
      these roots can be a part of inodes, xattr blocks, or xattr buckets.
      Thus, they need a different journal access function for each container.
      
      We wrap the bh, its journal access function, and the value root (xv) in
      a structure called ocfs2_xattr_valu_buf.  This is a package that can
      be passed around.  In this first pass, we simply pass it to the
      extent tree code.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2a50a743
    • J
      ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00
      Joel Becker 提交于
      The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
      commit triggers and allow us to compute metadata ecc right before the
      buffers are written out.  This commit provides ecc for inodes, extent
      blocks, group descriptors, and quota blocks.  It is not safe to use
      extened attributes and metaecc at the same time yet.
      
      The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
      the type of block at their root.  Before, it didn't matter, but now the
      root block must use the appropriate ocfs2_journal_access_*() function.
      To keep this abstract, the structures now have a pointer to the matching
      journal_access function and a wrapper call to call it.
      
      A few places use naked ocfs2_write_block() calls instead of adding the
      blocks to the journal.  We make sure to calculate their checksum and ecc
      before the write.
      
      Since we pass around the journal_access functions.  Let's typedef them
      in ocfs2.h.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      13723d00
    • J
      ocfs2: Wrap up the common use cases of ocfs2_new_path(). · ffdd7a54
      Joel Becker 提交于
      The majority of ocfs2_new_path() calls are:
      
      	ocfs2_new_path(path_root_bh(otherpath),
      		       path_root_el(otherpath));
      
      Let's call that ocfs2_new_path_from_path().  The rest do similar things
      from struct ocfs2_extent_tree.  Let's call those
      ocfs2_new_path_from_et().  This will make the next change easier.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ffdd7a54
    • J
      ocfs2: block read meta ecc. · d6b32bbb
      Joel Becker 提交于
      Add block check calls to the read_block validate functions.  This is the
      almost all of the read-side checking of metaecc.  xattr buckets are not checked
      yet.   Writes are also unchecked, and so a read-write mount will quickly fail.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d6b32bbb
    • J
      ocfs2: Add quota calls for allocation and freeing of inodes and space · a90714c1
      Jan Kara 提交于
      Add quota calls for allocation and freeing of inodes and space, also update
      estimates on number of needed credits for a transaction. Move out inode
      allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
      outside of a transaction.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a90714c1
    • M
      ocfs2: Remove JBD compatibility layer · 53ef99ca
      Mark Fasheh 提交于
      JBD2 is fully backwards compatible with JBD and it's been tested enough with
      Ocfs2 that we can clean this code up now.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      53ef99ca
    • J
      ocfs2: Validate metadata only when it's read from disk. · 970e4936
      Joel Becker 提交于
      Add an optional validation hook to ocfs2_read_blocks().  Now the
      validation function is only called when a block was actually read off of
      disk.  It is not called when the buffer was in cache.
      
      We add a buffer state bit BH_NeedsValidate to flag these buffers.  It
      must always be one higher than the last JBD2 buffer state bit.
      
      The dinode, dirblock, extent_block, and xattr_block validators are
      lifted to this scheme directly.  The group_descriptor validator needs to
      be split into two pieces.  The first part only needs the gd buffer and
      is passed to ocfs2_read_block().  The second part requires the dinode as
      well, and is called every time.  It's only 3 compares, so it's tiny.
      This also allows us to clean up the non-fatal gd check used by resize.c.
      It now has no magic argument.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      970e4936
    • J
      ocfs2: Wrap extent block reads in a dedicated function. · 5e96581a
      Joel Becker 提交于
      We weren't consistently checking extent blocks after we read them.
      Most places checked the signature, but none checked h_blkno or
      h_fs_signature.  Create a toplevel ocfs2_read_extent_block() that does
      the read and the validation.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5e96581a
    • J
      ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks. · 10995aa2
      Joel Becker 提交于
      Random places in the code would check a dinode bh to see if it was
      valid.  Not only did they do different levels of validation, they
      handled errors in different ways.
      
      The previous commit unified inode block reads, validating all block
      reads in the same place.  Thus, these haphazard checks are no longer
      necessary.  Rather than eliminate them, however, we change them to
      BUG_ON() checks.  This ensures the assumptions remain true.  All of the
      code paths to these checks have been audited to ensure they come from a
      validated inode read.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      10995aa2
    • J
      ocfs2: Wrap inode block reads in a dedicated function. · b657c95c
      Joel Becker 提交于
      The ocfs2 code currently reads inodes off disk with a simple
      ocfs2_read_block() call.  Each place that does this has a different set
      of sanity checks it performs.  Some check only the signature.  A couple
      validate the block number (the block read vs di->i_blkno).  A couple
      others check for VALID_FL.  Only one place validates i_fs_generation.  A
      couple check nothing.  Even when an error is found, they don't all do
      the same thing.
      
      We wrap inode reading into ocfs2_read_inode_block().  This will validate
      all the above fields, going readonly if they are invalid (they never
      should be).  ocfs2_read_inode_block_full() is provided for the places
      that want to pass read_block flags.  Every caller is passing a struct
      inode with a valid ip_blkno, so we don't need a separate blkno argument
      either.
      
      We will remove the validation checks from the rest of the code in a
      later commit, as they are no longer necessary.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b657c95c
    • M
      ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range() · fecc0112
      Mark Fasheh 提交于
      This patch genericizes the high level handling of extent removal.
      ocfs2_remove_btree_range() is nearly identical to
      __ocfs2_remove_inode_range(), except that extent tree operations have been
      used where necessary. We update ocfs2_remove_inode_range() to use the
      generic helper. Now extent tree based structures have an easy way to
      truncate ranges.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      fecc0112
    • T
      ocfs2: Add clusters free in dealloc_ctxt. · 2891d290
      Tao Ma 提交于
      Now in ocfs2 xattr set, the whole process are divided into many small
      parts and they are wrapped into diffrent transactions and it make the
      set doesn't look like a real transaction. So we want to integrate it
      into a real one.
      
      In some cases we will allocate some clusters and free some in just one
      transaction. e.g, one xattr is larger than inline size, so it and its
      value root is stored within the inode while the value is outside in a
      cluster. Then we try to update it with a smaller value(larger than the
      size of root but smaller than inline size), we may need to free the
      outside cluster while allocate a new bucket(one cluster) since now the
      inode may be full. The old solution will lock the global_bitmap(if the
      local alloc failed in stress test) and then the truncate log. This will
      cause a ABBA lock with truncate log flush.
      
      This patch add the clusters free in dealloc_ctxt, so that we can record
      the free clusters during the transaction and then free it after we
      release the global_bitmap in xattr set.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2891d290
  11. 15 10月, 2008 2 次提交
  12. 14 10月, 2008 16 次提交