1. 05 9月, 2009 2 次提交
  2. 22 7月, 2009 1 次提交
    • J
      ocfs2: Fix deadlock on umount · f7b1aa69
      Jan Kara 提交于
      In commit ea455f8a, we moved the dentry lock
      put process into ocfs2_wq. This causes problems during umount because ocfs2_wq
      can drop references to inodes while they are being invalidated by
      invalidate_inodes() causing all sorts of nasty things (invalidate_inodes()
      ending in an infinite loop, "Busy inodes after umount" messages etc.).
      
      We fix the problem by stopping ocfs2_wq from doing any further releasing of
      inode references on the superblock being unmounted, wait until it finishes
      the current round of releasing and finally cleaning up all the references in
      dentry_lock_list from ocfs2_put_super().
      
      The issue was tracked down by Tao Ma <tao.ma@oracle.com>.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      f7b1aa69
  3. 23 6月, 2009 2 次提交
  4. 04 6月, 2009 3 次提交
    • J
      ocfs2: Add statistics for the checksum and ecc operations. · 73be192b
      Joel Becker 提交于
      It would be nice to know how often we get checksum failures.  Even
      better, how many of them we can fix with the single bit ecc.  So, we add
      a statistics structure.  The structure can be installed into debugfs
      wherever the user wants.
      
      For ocfs2, we'll put it in the superblock-specific debugfs directory and
      pass it down from our higher-level functions.  The stats are only
      registered with debugfs when the filesystem supports metadata ecc.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      73be192b
    • S
      ocfs2 patch to track delayed orphan scan timer statistics · 15633a22
      Srinivas Eeda 提交于
      Patch to track delayed orphan scan timer statistics.
      
      Modifies ocfs2_osb_dump to print the following:
        Orphan Scan=> Local: 10  Global: 21  Last Scan: 67 seconds ago
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      15633a22
    • S
      ocfs2: timer to queue scan of all orphan slots · 83273932
      Srinivas Eeda 提交于
      When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
      before moving the dentry to the orphan directory. Other nodes that have
      this dentry in cache have a PR on the same dentry lock.  When the EX is
      requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
      during downconvert.  The inode is finally deleted when the last node to iput
      the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.
      
      A problem arises if a node is forced to free dentry locks because of memory
      pressure. If this happens, the node will no longer get downconvert
      notifications for the dentries that have been unlinked on another node.
      If it also happens that node is actively using the corresponding inode and
      happens to be the one performing the last iput on that inode, it will fail
      to delete the inode as it will not have the MAYBE_ORPHANED flag set.
      
      This patch fixes this shortcoming by introducing a periodic scan of the
      orphan directories to delete such inodes. Care has been taken to distribute
      the workload across the cluster so that no one node has to perform the task
      all the time.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83273932
  5. 04 4月, 2009 8 次提交
  6. 27 2月, 2009 1 次提交
  7. 03 2月, 2009 1 次提交
  8. 06 1月, 2009 9 次提交
    • M
      ocfs2: Add directory block trailers. · 87d35a74
      Mark Fasheh 提交于
      Future ocfs2 features metaecc and indexed directories need to store a
      little bit of data in each dirblock.  For compatibility, we place this
      in a trailer at the end of the dirblock.  The trailer plays itself as an
      empty dirent, so that if the features are turned off, it can be reused
      without requiring a tunefs scan.
      
      This code adds the trailer and validates it when the block is read in.
      
      [ Mark is the original author, but I reinserted this code before his
        dir index work.  -- Joel ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      87d35a74
    • J
      ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00
      Joel Becker 提交于
      The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
      commit triggers and allow us to compute metadata ecc right before the
      buffers are written out.  This commit provides ecc for inodes, extent
      blocks, group descriptors, and quota blocks.  It is not safe to use
      extened attributes and metaecc at the same time yet.
      
      The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
      the type of block at their root.  Before, it didn't matter, but now the
      root block must use the appropriate ocfs2_journal_access_*() function.
      To keep this abstract, the structures now have a pointer to the matching
      journal_access function and a wrapper call to call it.
      
      A few places use naked ocfs2_write_block() calls instead of adding the
      blocks to the journal.  We make sure to calculate their checksum and ecc
      before the write.
      
      Since we pass around the journal_access functions.  Let's typedef them
      in ocfs2.h.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      13723d00
    • J
      ocfs2: Add the underlying blockcheck code. · 70ad1ba7
      Joel Becker 提交于
      This is the code that computes crc32 and ecc for ocfs2 metadata blocks.
      There are high-level functions that check whether the filesystem has the
      ecc feature, mid-level functions that work on a single block or array of
      buffer_heads, and the low-level ecc hamming code that can handle
      multiple buffers like crc32_le().
      
      It's not hooked up to the filesystem yet.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      70ad1ba7
    • J
      ocfs2: Enable quota accounting on mount, disable on umount · 19ece546
      Jan Kara 提交于
      Enable quota usage tracking on mount and disable it on umount. Also
      add support for quota on and quota off quotactls and usrquota and
      grpquota mount options. Add quota features among supported ones.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      19ece546
    • J
      ocfs2: Implement quota recovery · 2205363d
      Jan Kara 提交于
      Implement functions for recovery after a crash. Functions just
      read local quota file and sync info to global quota file.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2205363d
    • J
      ocfs2: Wrap extent block reads in a dedicated function. · 5e96581a
      Joel Becker 提交于
      We weren't consistently checking extent blocks after we read them.
      Most places checked the signature, but none checked h_blkno or
      h_fs_signature.  Create a toplevel ocfs2_read_extent_block() that does
      the read and the validation.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5e96581a
    • J
      ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks. · 42035306
      Joel Becker 提交于
      Random places in the code would check a group descriptor bh to see if it
      was valid. The previous commit unified descriptor block reads,
      validating all block reads in the same place.  Thus, these checks are no
      longer necessary.  Rather than eliminate them, however, we change them
      to BUG_ON() checks.  This ensures the assumptions remain true.  All of
      the code paths to these checks have been audited to ensure they come
      from a validated descriptor read.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      42035306
    • J
      ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks. · 10995aa2
      Joel Becker 提交于
      Random places in the code would check a dinode bh to see if it was
      valid.  Not only did they do different levels of validation, they
      handled errors in different ways.
      
      The previous commit unified inode block reads, validating all block
      reads in the same place.  Thus, these haphazard checks are no longer
      necessary.  Rather than eliminate them, however, we change them to
      BUG_ON() checks.  This ensures the assumptions remain true.  All of the
      code paths to these checks have been audited to ensure they come from a
      validated inode read.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      10995aa2
    • T
      ocfs2: add POSIX ACL API · 929fb014
      Tiger Yang 提交于
      This patch adds POSIX ACL(access control lists) APIs in ocfs2. We convert
      struct posix_acl to many ocfs2_acl_entry and regard them as an extended
      attribute entry.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      929fb014
  9. 02 12月, 2008 1 次提交
  10. 11 11月, 2008 1 次提交
  11. 14 10月, 2008 8 次提交
    • J
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker 提交于
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2b4e30fb
    • J
      ocfs2: Add the 'inode64' mount option. · 12462f1d
      Joel Becker 提交于
      Now that ocfs2 limits inode numbers to 32bits, add a mount option to
      disable the limit.  This parallels XFS.  64bit systems can handle the
      larger inode numbers.
      
      [ Added description of inode64 mount option in ocfs2.txt. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      12462f1d
    • T
      ocfs2: Add incompatible flag for extended attribute · 8154da3d
      Tiger Yang 提交于
      This patch adds the s_incompat flag for extended attribute support. This
      helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
      to mount a volume with xattr support.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8154da3d
    • T
      ocfs2: Add extended attribute support · cf1d6c76
      Tiger Yang 提交于
      This patch implements storing extended attributes both in inode or a single
      external block. We only store EA's in-inode when blocksize > 512 or that
      inode block has free space for it. When an EA's value is larger than 80
      bytes, we will store the value via b-tree outside inode or block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf1d6c76
    • T
      ocfs2: reserve inline space for extended attribute · fdd77704
      Tiger Yang 提交于
      Add the structures and helper functions we want for handling inline extended
      attributes. We also update the inline-data handlers so that they properly
      function in the event that we have both inline data and inline attributes
      sharing an inode block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      fdd77704
    • M
      ocfs2: track local alloc state via debugfs · 9a8ff578
      Mark Fasheh 提交于
      A per-mount debugfs file, "local_alloc" is created which when read will
      expose live state of the nodes local alloc file. Performance impact is
      minimal, only a bit of memory overhead per mount point. Still, the code is
      hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
      local alloc performance problems on a live system.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9a8ff578
    • M
      ocfs2: throttle back local alloc when low on disk space · 9c7af40b
      Mark Fasheh 提交于
      Ocfs2's local allocator disables itself for the duration of a mount point
      when it has trouble allocating a large enough area from the primary bitmap.
      That can cause performance problems, especially for disks which were only
      temporarily full or fragmented. This patch allows for the allocator to
      shrink it's window first, before being disabled. Later, it can also be
      re-enabled so that any performance drop is minimized.
      
      To do this, we allow the value of osb->local_alloc_bits to be shrunk when
      needed. The default value is recorded in a mostly read-only variable so that
      we can re-initialize when required.
      
      Locking had to be updated so that we could protect changes to
      local_alloc_bits. Mostly this involves protecting various local alloc values
      with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
      is used when the local allocator is has shrunk, but is not disabled. If the
      available space dips below 1 megabyte, the local alloc file is disabled. In
      either case, local alloc is re-enabled 30 seconds after the event, or when
      an appropriate amount of bits is seen in the primary bitmap.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9c7af40b
    • M
      ocfs2: Track local alloc bits internally · ebcee4b5
      Mark Fasheh 提交于
      Do this instead of tracking absolute local alloc size. This avoids
      needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
      the value is now in a more natural unit for internal file system bitmap
      work.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ebcee4b5
  12. 01 8月, 2008 1 次提交
    • S
      [PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264
      Sunil Mushran 提交于
      As the fs recovery is asynchronous, there is a small chance that another
      node can mount (and thus recover) the slot before the recovery thread
      gets to it.
      
      If this happens, the recovery thread will block indefinitely on the
      journal/slot lock as that lock will be held for the duration of the mount
      (by design) by the node assigned to that slot.
      
      The solution implemented is to keep track of the journal replays using
      a recovery generation in the journal inode, which will be incremented by the
      thread replaying that journal. The recovery thread, before attempting the
      blocking lock on the journal/slot lock, will compare the generation on disk
      with what it has cached and skip recovery if it does not match.
      
      This bug appears to have been inadvertently introduced during the mount/umount
      vote removal by mainline commit 34d024f8. In the
      mount voting scheme, the messaging would indirectly indicate that the slot
      was being recovered.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      539d8264
  13. 15 7月, 2008 1 次提交
  14. 18 4月, 2008 1 次提交