1. 14 10月, 2008 8 次提交
    • J
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker 提交于
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2b4e30fb
    • J
      ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree() · 8d6220d6
      Joel Becker 提交于
      The original get/put_extent_tree() functions held a reference on
      et_root_bh.  However, every single caller already has a safe reference,
      making the get/put cycle irrelevant.
      
      We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree().  It
      no longer gets a reference on et_root_bh.  ocfs2_put_extent_tree() is
      removed.  Callers now have a simpler init+use pattern.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8d6220d6
    • J
      ocfs2: Make ocfs2_extent_tree the first-class representation of a tree. · f99b9b7c
      Joel Becker 提交于
      We now have three different kinds of extent trees in ocfs2: inode data
      (dinode), extended attributes (xattr_tree), and extended attribute
      values (xattr_value).  There is a nice abstraction for them,
      ocfs2_extent_tree, but it is hidden in alloc.c.  All the calling
      functions have to pick amongst a varied API and pass in type bits and
      often extraneous pointers.
      
      A better way is to make ocfs2_extent_tree a first-class object.
      Everyone converts their object to an ocfs2_extent_tree() via the
      ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
      tree calls to alloc.c.
      
      This simplifies a lot of callers, making for readability.  It also
      provides an easy way to add additional extent tree types, as they only
      need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
      function.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f99b9b7c
    • T
      ocfs2: Add extent tree operation for xattr value btrees · f56654c4
      Tao Ma 提交于
      Add some thin wrappers around ocfs2_insert_extent() for each of the 3
      different btree types, ocfs2_inode_insert_extent(),
      ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
      last is for the xattr index btree, which will be used in a followup patch.
      
      All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
      while the other two handle the xattr issue. And the init of extent tree are
      handled by these functions.
      
      When storing xattr value which is too large, we will allocate some clusters
      for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
      order to re-use the b-tree operation code, a new parameter named "private"
      is added into ocfs2_extent_tree and it is used to indicate the root of
      ocfs2_exent_list. The reason is that we can't deduce the root from the
      buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
      in any place in an ocfs2_xattr_bucket.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f56654c4
    • T
      ocfs2: Make high level btree extend code generic · 0eb8d47e
      Tao Ma 提交于
      Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
      function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
      ocfs2_do_cluster_allocation() now, but the latter can be used for other
      btree types as well.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0eb8d47e
    • T
      ocfs2: Abstract ocfs2_extent_tree in b-tree operations. · e7d4cb6b
      Tao Ma 提交于
      In the old extent tree operation, we take the hypothesis that we
      are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
      As xattr will also use ocfs2_extent_list to store large value
      for a xattr entry, we refactor the tree operation so that xattr
      can use it directly.
      
      The refactoring includes 4 steps:
      1. Abstract set/get of last_eb_blk and update_clusters since they may
         be stored in different location for dinode and xattr.
      2. Add a new structure named ocfs2_extent_tree to indicate the
         extent tree the operation will work on.
      3. Remove all the use of fe_bh and di, use root_bh and root_el in
         extent tree instead. So now all the fe_bh is replaced with
         et->root_bh, el with root_el accordingly.
      4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
         in file extend allocation. But the whole function is useful when we want
         to store large EAs.
      
      Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
      for anything other than truncate inode data btrees.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e7d4cb6b
    • T
      ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. · 811f933d
      Tao Ma 提交于
      ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
      ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
      they are all limited to an inode btree because they use a struct
      ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
      (the part of an ocfs2_dinode they actually use) so that the xattr btree code
      can use these functions.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      811f933d
    • T
      ocfs2: Modify ocfs2_num_free_extents for future xattr usage. · 231b87d1
      Tao Ma 提交于
      ocfs2_num_free_extents() is used to find the number of free extent records
      in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
      use this for extended attribute trees in the future, so genericize the
      interface the take a buffer head. A future patch will allow that buffer_head
      to contain any structure rooting an ocfs2 btree.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      231b87d1
  2. 10 9月, 2008 1 次提交
    • T
      ocfs2: Fix a bug in direct IO read. · 0e116227
      Tao Ma 提交于
      ocfs2 will become read-only if we try to read the bytes which pass
      the end of i_size. This can be easily reproduced by following steps:
      1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse.
      2. create a small file(say less than 100 bytes) and we will create the file
         which is allocated 1 cluster.
      3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit.
      4. The ocfs2 volume becomes read-only and dmesg shows:
      OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks:
      Inode 66010 has a hole at block 1
      File system is now read-only due to the potential of on-disk corruption.
      Please run fsck.ocfs2 once the file system is unmounted.
      
      So suppress the ERROR message.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0e116227
  3. 01 8月, 2008 1 次提交
  4. 17 7月, 2008 1 次提交
    • C
      [PATCH] ocfs2: fix oops in mmap_truncate testing · c0420ad2
      Coly Li 提交于
      This patch fixes a mmap_truncate bug which was found by ocfs2 test suite.
      
      In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races
      mmap writes and truncates from multiple processes. While the test is
      running, a stat from another node forces writeout, causing an oops in
      ocfs2_get_block() because it sees a buffer to write which isn't allocated.
      
      This patch fixed the bug by clear dirty and uptodate bits in buffer, leave
      the buffer unmapped and return.
      
      Fix is suggested by Mark Fasheh, and I code up the patch.
      Signed-off-by: NColy Li <coyli@suse.de>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      c0420ad2
  5. 18 4月, 2008 1 次提交
  6. 04 3月, 2008 1 次提交
  7. 06 2月, 2008 1 次提交
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  8. 26 1月, 2008 4 次提交
  9. 28 11月, 2007 1 次提交
  10. 07 11月, 2007 1 次提交
  11. 17 10月, 2007 1 次提交
  12. 13 10月, 2007 4 次提交
    • M
      ocfs2: Write support for inline data · 1afc32b9
      Mark Fasheh 提交于
      This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
      inode data.
      
      For the most part, the changes to the core write code can be relied on to do
      the heavy lifting. Any code calling ocfs2_write_begin (including shared
      writeable mmap) can count on it doing the right thing with respect to
      growing inline data to an extent tree.
      
      Size reducing truncates, including UNRESVP can simply zero that portion of
      the inode block being removed. Size increasing truncatesm, including RESVP
      have to be a little bit smarter and grow the inode to an extent tree if
      necessary.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      1afc32b9
    • M
      ocfs2: Read support for inline data · 6798d35a
      Mark Fasheh 提交于
      This hooks up ocfs2_readpage() to populate a page with data from an inode
      block. Direct IO reads from inline data are modified to fall back to
      buffered I/O. Appropriate checks are also placed in the extent map code to
      avoid reading an extent list when inline data might be stored.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      6798d35a
    • M
      ocfs2: Small refactor of truncate zeroing code · 1d410a6e
      Mark Fasheh 提交于
      We'll want to reuse most of this when pushing inline data back out to an
      extent. Keeping this part as a seperate patch helps to keep the upcoming
      changes for write support uncluttered.
      
      The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
      page is mapped and properly dirtied is abstracted out into it's own
      function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
      though zeroing becomes optional.
      
      We also turn part of ocfs2_free_write_ctxt() into  a common function for
      unlocking and freeing a page array. This operation is very common (and
      uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
      to keep the code in one place.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      1d410a6e
    • M
      ocfs2: move nonsparse hole-filling into ocfs2_write_begin() · 65ed39d6
      Mark Fasheh 提交于
      By doing this, we can remove any higher level logic which has to have
      knowledge of btree functionality - any callers of ocfs2_write_begin() can
      now expect it to do anything necessary to prepare the inode for new data.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      65ed39d6
  13. 21 9月, 2007 2 次提交
  14. 12 9月, 2007 1 次提交
  15. 20 7月, 2007 1 次提交
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  16. 11 7月, 2007 9 次提交
  17. 07 6月, 2007 1 次提交
    • M
      ocfs2: Fix invalid assertion during write on 64k pages · eeb47d12
      Mark Fasheh 提交于
      The write path code intends to bug if a math error (or unhandled case)
      results in a write outside of the current cluster boundaries. The actual
      BUG_ON() statements however are incorrect, leading to a crash on kernels
      with 64k page size. Fix those by checking against the right variables.
      
      Also, move the assertions higher up within the functions so that they trip
      *before* the code starts to mark buffers.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      eeb47d12
  18. 26 5月, 2007 1 次提交