1. 05 9月, 2009 3 次提交
    • J
      ocfs2: Pass struct ocfs2_caching_info to the journal functions. · 0cf2f763
      Joel Becker 提交于
      The next step in divorcing metadata I/O management from struct inode is
      to pass struct ocfs2_caching_info to the journal functions.  Thus the
      journal locks a metadata cache with the cache io_lock function.  It also
      can compare ci_last_trans and ci_created_trans directly.
      
      This is a large patch because of all the places we change
      ocfs2_journal_access..(handle, inode, ...) to
      ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      0cf2f763
    • J
      ocfs2: move ip_created_trans to struct ocfs2_caching_info · 292dd27e
      Joel Becker 提交于
      Similar ip_last_trans, ip_created_trans tracks the creation of a journal
      managed inode.  This specifically tracks what transaction created the
      inode.  This is so the code can know if the inode has ever been written
      to disk.
      
      This behavior is desirable for any journal managed object.  We move it
      to struct ocfs2_caching_info as ci_created_trans so that any object
      using ocfs2_caching_info can rely on this behavior.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      292dd27e
    • J
      ocfs2: move ip_last_trans to struct ocfs2_caching_info · 66fb345d
      Joel Becker 提交于
      We have the read side of metadata caching isolated to struct
      ocfs2_caching_info, now we need the write side.  This means the journal
      functions.  The journal only does a couple of things with struct inode.
      
      This change moves the ip_last_trans field onto struct
      ocfs2_caching_info as ci_last_trans.  This field tells the journal
      whether a pending journal flush is required.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      66fb345d
  2. 11 8月, 2009 1 次提交
    • J
      ocfs2: Fix possible deadlock when extending quota file · b409d7a0
      Jan Kara 提交于
      In OCFS2, allocator locks rank above transaction start. Thus we
      cannot extend quota file from inside a transaction less we could
      deadlock.
      
      We solve the problem by starting transaction not already in
      ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and
      ocfs2_global_read_dquot() and we allocate blocks to quota files before starting
      the transaction.  In case we crash, quota files will just have a few blocks
      more but that's no problem since we just use them next time we extend the
      quota file.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b409d7a0
  3. 24 7月, 2009 1 次提交
  4. 09 7月, 2009 1 次提交
  5. 23 6月, 2009 1 次提交
  6. 04 6月, 2009 1 次提交
    • S
      ocfs2: timer to queue scan of all orphan slots · 83273932
      Srinivas Eeda 提交于
      When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
      before moving the dentry to the orphan directory. Other nodes that have
      this dentry in cache have a PR on the same dentry lock.  When the EX is
      requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
      during downconvert.  The inode is finally deleted when the last node to iput
      the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.
      
      A problem arises if a node is forced to free dentry locks because of memory
      pressure. If this happens, the node will no longer get downconvert
      notifications for the dentries that have been unlinked on another node.
      If it also happens that node is actively using the corresponding inode and
      happens to be the one performing the last iput on that inode, it will fail
      to delete the inode as it will not have the MAYBE_ORPHANED flag set.
      
      This patch fixes this shortcoming by introducing a periodic scan of the
      orphan directories to delete such inodes. Care has been taken to distribute
      the workload across the cluster so that no one node has to perform the task
      all the time.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83273932
  7. 01 5月, 2009 1 次提交
    • J
      ocfs2: Fix a missing credit when deleting from indexed directories. · dfa13f39
      Joel Becker 提交于
      The ocfs2 directory index updates two blocks when we remove an entry -
      the dx root and the dx leaf.  OCFS2_DELETE_INODE_CREDITS was only
      accounting for the dx leaf.  This shows up when ocfs2_delete_inode()
      runs out of credits in jbd2_journal_dirty_metadata() at
      "J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".
      
      The test that caught this was running dirop_file_racer from the
      ocfs2-test suite with a 250-character filename PREFIX.  Run on a 512B
      blocksize, it forces the orphan dir index to grow large enough to
      trigger.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      dfa13f39
  8. 04 4月, 2009 5 次提交
  9. 11 2月, 2009 1 次提交
    • J
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara 提交于
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215
  10. 06 1月, 2009 5 次提交
  11. 14 10月, 2008 3 次提交
    • J
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker 提交于
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2b4e30fb
    • T
      ocfs2: Add extended attribute support · cf1d6c76
      Tiger Yang 提交于
      This patch implements storing extended attributes both in inode or a single
      external block. We only store EA's in-inode when blocksize > 512 or that
      inode block has free space for it. When an EA's value is larger than 80
      bytes, we will store the value via b-tree outside inode or block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf1d6c76
    • T
      ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. · 811f933d
      Tao Ma 提交于
      ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
      ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
      they are all limited to an inode btree because they use a struct
      ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
      (the part of an ocfs2_dinode they actually use) so that the xattr btree code
      can use these functions.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      811f933d
  12. 01 8月, 2008 1 次提交
    • S
      [PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264
      Sunil Mushran 提交于
      As the fs recovery is asynchronous, there is a small chance that another
      node can mount (and thus recover) the slot before the recovery thread
      gets to it.
      
      If this happens, the recovery thread will block indefinitely on the
      journal/slot lock as that lock will be held for the duration of the mount
      (by design) by the node assigned to that slot.
      
      The solution implemented is to keep track of the journal replays using
      a recovery generation in the journal inode, which will be incremented by the
      thread replaying that journal. The recovery thread, before attempting the
      blocking lock on the journal/slot lock, will compare the generation on disk
      with what it has cached and skip recovery if it does not match.
      
      This bug appears to have been inadvertently introduced during the mount/umount
      vote removal by mainline commit 34d024f8. In the
      mount voting scheme, the messaging would indirectly indicate that the slot
      was being recovered.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      539d8264
  13. 18 4月, 2008 1 次提交
    • J
      ocfs2: Change the recovery map to an array of node numbers. · 553abd04
      Joel Becker 提交于
      The old recovery map was a bitmap of node numbers.  This was sufficient
      for the maximum node number of 254.  Going forward, we want node numbers
      to be UINT32.  Thus, we need a new recovery map.
      
      Note that we can't keep track of slots here.  We must write down the
      node number to recovery *before* we get the locks needed to convert a
      node number into a slot number.
      
      The recovery map is now an array of unsigned ints, max_slots in size.
      It moves to journal.c with the rest of recovery.
      
      Because it needs to be initialized, we move all of recovery initialization
      into a new function, ocfs2_recovery_init().  This actually cleans up
      ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
      becomes part of ocfs2_recovery_exit().
      
      A number of node map functions are rendered obsolete and are removed.
      
      Finally, waiting on recovery is wrapped in a function rather than naked
      checks on the recovery_event.  This is a cleanup from Mark.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      553abd04
  14. 26 1月, 2008 2 次提交
    • T
      [PATCH 2/2] ocfs2: Implement group add for online resize · 7909f2bf
      Tao Ma 提交于
      This patch adds the ability for a userspace program to request that a
      properly formatted cluster group be added to the main allocation bitmap for
      an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD.
      On a high level, this is similar to ext3, but we use a different ioctl as
      the structure which has to be passed through is different.
      
      During an online resize, tunefs.ocfs2 will format any new cluster groups
      which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on
      each one. Kernel verifies that the core cluster group information is valid
      and then does the work of linking it into the global allocation bitmap.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      7909f2bf
    • T
      [PATCH 1/2] ocfs2: Add group extend for online resize · d659072f
      Tao Ma 提交于
      This patch adds the ability for a userspace program to request an extend of
      last cluster group on an Ocfs2 file system. The request is made via ioctl,
      OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is
      obviously Ocfs2 specific.
      
      tunefs.ocfs2 would call this for an online-resize operation if the last
      cluster group isn't full.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d659072f
  15. 13 10月, 2007 1 次提交
    • M
      ocfs2: Write support for inline data · 1afc32b9
      Mark Fasheh 提交于
      This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
      inode data.
      
      For the most part, the changes to the core write code can be relied on to do
      the heavy lifting. Any code calling ocfs2_write_begin (including shared
      writeable mmap) can count on it doing the right thing with respect to
      growing inline data to an extent tree.
      
      Size reducing truncates, including UNRESVP can simply zero that portion of
      the inode block being removed. Size increasing truncatesm, including RESVP
      have to be a little bit smarter and grow the inode to an extent tree if
      necessary.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      1afc32b9
  16. 11 7月, 2007 1 次提交
    • M
      ocfs2: support for removing file regions · 063c4561
      Mark Fasheh 提交于
      Provide an internal interface for the removal of arbitrary file regions.
      
      ocfs2_remove_inode_range() takes a byte range within a file and will remove
      existing extents within that range. Partial clusters will be zeroed so that
      any read from within the region will return zeros.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      063c4561
  17. 27 4月, 2007 1 次提交
    • M
      ocfs2: make room for unwritten extents flag · e48edee2
      Mark Fasheh 提交于
      Due to the size of our group bitmaps, we'll never have a leaf node extent
      record with more than 16 bits worth of clusters. Split e_clusters up so that
      leaf nodes can get a flags field where we can mark unwritten extents.
      Interior nodes whose length references all the child nodes beneath it can't
      split their e_clusters field, so we use a union to preserve sizing there.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      e48edee2
  18. 02 2月, 2007 1 次提交
    • M
      ocfs2: ocfs2_link() journal credits update · e051fda4
      Mark Fasheh 提交于
      Commit 592282cf fixed some missing directory
      c/mtime updates in part by introducing a dinode update in ocfs2_add_entry().
      Unfortunately, ocfs2_link() (which didn't update the directory inode before)
      is now missing a single journal credit. Fix this by doubling the number of
      inode updates expected during hard link creation.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      e051fda4
  19. 08 12月, 2006 1 次提交
  20. 02 12月, 2006 8 次提交