1. 04 4月, 2009 1 次提交
    • M
      ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef
      Mark Fasheh 提交于
      This patch makes use of Ocfs2's flexible btree code to add an additional
      tree to directory inodes. The new tree stores an array of small,
      fixed-length records in each leaf block. Each record stores a hash value,
      and pointer to a block in the traditional (unindexed) directory tree where a
      dirent with the given name hash resides. Lookup exclusively uses this tree
      to find dirents, thus providing us with constant time name lookups.
      
      Some of the hashing code was copied from ext3. Unfortunately, it has lots of
      unfixed checkpatch errors. I left that as-is so that tracking changes would
      be easier.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      9b7895ef
  2. 06 1月, 2009 7 次提交
    • J
      ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00
      Joel Becker 提交于
      The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
      commit triggers and allow us to compute metadata ecc right before the
      buffers are written out.  This commit provides ecc for inodes, extent
      blocks, group descriptors, and quota blocks.  It is not safe to use
      extened attributes and metaecc at the same time yet.
      
      The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
      the type of block at their root.  Before, it didn't matter, but now the
      root block must use the appropriate ocfs2_journal_access_*() function.
      To keep this abstract, the structures now have a pointer to the matching
      journal_access function and a wrapper call to call it.
      
      A few places use naked ocfs2_write_block() calls instead of adding the
      blocks to the journal.  We make sure to calculate their checksum and ecc
      before the write.
      
      Since we pass around the journal_access functions.  Let's typedef them
      in ocfs2.h.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      13723d00
    • J
      ocfs2: block read meta ecc. · d6b32bbb
      Joel Becker 提交于
      Add block check calls to the read_block validate functions.  This is the
      almost all of the read-side checking of metaecc.  xattr buckets are not checked
      yet.   Writes are also unchecked, and so a read-write mount will quickly fail.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d6b32bbb
    • J
      ocfs2: Add quota calls for allocation and freeing of inodes and space · a90714c1
      Jan Kara 提交于
      Add quota calls for allocation and freeing of inodes and space, also update
      estimates on number of needed credits for a transaction. Move out inode
      allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
      outside of a transaction.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a90714c1
    • J
      ocfs2: Mark system files as not subject to quota accounting · bbbd0eb3
      Jan Kara 提交于
      Mark system files as not subject to quota accounting. This prevents
      possible recursions into quota code and thus deadlocks.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      bbbd0eb3
    • J
      1a224ad1
    • J
      ocfs2: Validate metadata only when it's read from disk. · 970e4936
      Joel Becker 提交于
      Add an optional validation hook to ocfs2_read_blocks().  Now the
      validation function is only called when a block was actually read off of
      disk.  It is not called when the buffer was in cache.
      
      We add a buffer state bit BH_NeedsValidate to flag these buffers.  It
      must always be one higher than the last JBD2 buffer state bit.
      
      The dinode, dirblock, extent_block, and xattr_block validators are
      lifted to this scheme directly.  The group_descriptor validator needs to
      be split into two pieces.  The first part only needs the gd buffer and
      is passed to ocfs2_read_block().  The second part requires the dinode as
      well, and is called every time.  It's only 3 compares, so it's tiny.
      This also allows us to clean up the non-fatal gd check used by resize.c.
      It now has no magic argument.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      970e4936
    • J
      ocfs2: Wrap inode block reads in a dedicated function. · b657c95c
      Joel Becker 提交于
      The ocfs2 code currently reads inodes off disk with a simple
      ocfs2_read_block() call.  Each place that does this has a different set
      of sanity checks it performs.  Some check only the signature.  A couple
      validate the block number (the block read vs di->i_blkno).  A couple
      others check for VALID_FL.  Only one place validates i_fs_generation.  A
      couple check nothing.  Even when an error is found, they don't all do
      the same thing.
      
      We wrap inode reading into ocfs2_read_inode_block().  This will validate
      all the above fields, going readonly if they are invalid (they never
      should be).  ocfs2_read_inode_block_full() is provided for the places
      that want to pass read_block flags.  Every caller is passing a struct
      inode with a valid ip_blkno, so we don't need a separate blkno argument
      either.
      
      We will remove the validation checks from the rest of the code in a
      later commit, as they are no longer necessary.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b657c95c
  3. 11 11月, 2008 1 次提交
  4. 15 10月, 2008 5 次提交
  5. 14 10月, 2008 4 次提交
    • M
      ocfs2: Don't check for NULL before brelse() · a81cb88b
      Mark Fasheh 提交于
      This is pointless as brelse() already does the check.
      
      Signed-off-by: Mark Fasheh
      a81cb88b
    • J
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker 提交于
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2b4e30fb
    • T
      ocfs2: Add extended attribute support · cf1d6c76
      Tiger Yang 提交于
      This patch implements storing extended attributes both in inode or a single
      external block. We only store EA's in-inode when blocksize > 512 or that
      inode block has free space for it. When an EA's value is larger than 80
      bytes, we will store the value via b-tree outside inode or block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf1d6c76
    • M
      ocfs2: POSIX file locks support · 53da4939
      Mark Fasheh 提交于
      This is actually pretty easy since fs/dlm already handles the bulk of the
      work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
      underlying lock manager, so I only had to add the right calls.
      
      Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
      UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
      Internally, the file system uses two sets of file_operations, depending on
      whether cluster aware plocks is required. This turns out to be easier than
      implementing local-only versions of ->lock.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      53da4939
  6. 26 1月, 2008 5 次提交
  7. 28 11月, 2007 2 次提交
  8. 13 10月, 2007 2 次提交
    • M
      ocfs2: Write support for inline data · 1afc32b9
      Mark Fasheh 提交于
      This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
      inode data.
      
      For the most part, the changes to the core write code can be relied on to do
      the heavy lifting. Any code calling ocfs2_write_begin (including shared
      writeable mmap) can count on it doing the right thing with respect to
      growing inline data to an extent tree.
      
      Size reducing truncates, including UNRESVP can simply zero that portion of
      the inode block being removed. Size increasing truncatesm, including RESVP
      have to be a little bit smarter and grow the inode to an extent tree if
      necessary.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      1afc32b9
    • M
      ocfs2: Structure updates for inline data · 15b1e36b
      Mark Fasheh 提交于
      Add the disk, network and memory structures needed to support data in inode.
      
      Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
      inline data.
      
      A new inode field, i_dyn_features, is added to facilitate tracking of
      dynamic inode state. Since it will be used often, we want to mirror it on
      ocfs2_inode_info, and transfer it via the meta data lvb.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      15b1e36b
  9. 09 5月, 2007 1 次提交
  10. 03 5月, 2007 3 次提交
  11. 27 4月, 2007 9 次提交
    • M
      ocfs2: Cache extent records · 83418978
      Mark Fasheh 提交于
      The extent map code was ripped out earlier because of an inability to deal
      with holes. This patch adds back a simpler caching scheme requiring far less
      code.
      
      Our old extent map caching was designed back when meta data block caching in
      Ocfs2 didn't work very well, resulting in many disk reads. These days our
      metadata caching is much better, resulting in no un-necessary disk reads. As
      a result, extent caching doesn't have to be as fancy, nor does it have to
      cache as many extents. Keeping the last 3 extents seen should be sufficient
      to give us a small performance boost on some streaming workloads.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      83418978
    • M
      ocfs2: Fix up i_blocks calculation to know about holes · 8110b073
      Mark Fasheh 提交于
      Older file systems which didn't support holes did a dumb calculation of
      i_blocks based on i_size. This is no longer accurate, so fix things up to
      take actual allocation into account.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      8110b073
    • M
      ocfs2: Read from an unwritten extent returns zeros · 49cb8d2d
      Mark Fasheh 提交于
      Return an optional extent flags field from our lookup functions and wire up
      callers to treat unwritten regions as holes for the purpose of returning
      zeros to the user.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      49cb8d2d
    • M
      ocfs2: zero tail of sparse files on truncate · 60b11392
      Mark Fasheh 提交于
      Since we don't zero on extend anymore, truncate needs to be fixed up to zero
      the part of a file between i_size and and end of it's cluster. Otherwise a
      subsequent extend could expose bad data.
      
      This introduced a new helper, which can be used in ocfs2_write().
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      60b11392
    • M
      ocfs2: teach extend/truncate about sparse files · 3a0782d0
      Mark Fasheh 提交于
      For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
      longer exists since i_size is not tied to i_clusters. In
      ocfs2_extend_file(), we skip the allocation / page zeroing code for file
      systems which understand sparse files.
      
      The core truncate code is changed to do a bottom up tree traversal. This
      gets abstracted out into it's own function. To make things more readable,
      most of the special case handling for in-inode extents from
      ocfs2_do_truncate() is also removed.
      
      Though write support for sparse files comes in a later patch, we at least
      update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      3a0782d0
    • M
      ocfs2: temporarily remove extent map caching · 363041a5
      Mark Fasheh 提交于
      The code in extent_map.c is not prepared to deal with a subtree being
      rotated between lookups. This can happen when filling holes in sparse files.
      Instead of a lengthy patch to update the code (which would likely lose the
      benefit of caching subtree roots), we remove most of the algorithms and
      implement a simple path based lookup. A less ambitious extent caching scheme
      will be added in a later patch.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      363041a5
    • M
      ocfs2: small cleanup of ocfs2_request_delete() · 6f16bf65
      Mark Fasheh 提交于
      There are two checks in there (one for inode newness, one for other mounted
      nodes) which are unnecessary, so remove them. The DLM will allow the trylock
      in either case without any messaging overhead.
      
      Removing these makes ocfs2_request_delete() a one liner function, so just
      move the trylock out one level into ocfs2_query_inode_wipe().
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      6f16bf65
    • T
      ocfs2: remove unused code · 68e2b740
      Tiger Yang 提交于
      Remove node messaging code that becomes unused with the delete inode vote
      removal.
      
      [Removed even more cruft which I spotted during review --Mark]
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      68e2b740
    • T
      ocfs2: Remove delete inode vote · 50008630
      Tiger Yang 提交于
      Ocfs2 currently does cluster-wide node messaging to check the open state of
      an inode during delete. This patch removes that mechanism in favor of an
      inode cluster lock which is taken at shared read when an inode is first read
      and dropped in clear_inode(). This allows a deleting node to test the
      liveness of an inode by attempting to take an exclusive lock.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      50008630