1. 04 6月, 2009 2 次提交
    • S
      ocfs2 patch to track delayed orphan scan timer statistics · 15633a22
      Srinivas Eeda 提交于
      Patch to track delayed orphan scan timer statistics.
      
      Modifies ocfs2_osb_dump to print the following:
        Orphan Scan=> Local: 10  Global: 21  Last Scan: 67 seconds ago
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      15633a22
    • S
      ocfs2: timer to queue scan of all orphan slots · 83273932
      Srinivas Eeda 提交于
      When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
      before moving the dentry to the orphan directory. Other nodes that have
      this dentry in cache have a PR on the same dentry lock.  When the EX is
      requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
      during downconvert.  The inode is finally deleted when the last node to iput
      the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.
      
      A problem arises if a node is forced to free dentry locks because of memory
      pressure. If this happens, the node will no longer get downconvert
      notifications for the dentries that have been unlinked on another node.
      If it also happens that node is actively using the corresponding inode and
      happens to be the one performing the last iput on that inode, it will fail
      to delete the inode as it will not have the MAYBE_ORPHANED flag set.
      
      This patch fixes this shortcoming by introducing a periodic scan of the
      orphan directories to delete such inodes. Care has been taken to distribute
      the workload across the cluster so that no one node has to perform the task
      all the time.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83273932
  2. 04 4月, 2009 3 次提交
  3. 27 2月, 2009 2 次提交
  4. 03 2月, 2009 1 次提交
  5. 06 1月, 2009 6 次提交
  6. 14 10月, 2008 10 次提交
    • M
      ocfs2: Don't check for NULL before brelse() · a81cb88b
      Mark Fasheh 提交于
      This is pointless as brelse() already does the check.
      
      Signed-off-by: Mark Fasheh
      a81cb88b
    • S
      ocfs2: Add xattr mount option in ocfs2_show_options() · b0f73cfc
      Sunil Mushran 提交于
      Patch adds check for [no]user_xattr in ocfs2_show_options() that completes
      the list of all mount options.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b0f73cfc
    • J
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker 提交于
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2b4e30fb
    • J
      ocfs2: Add the 'inode64' mount option. · 12462f1d
      Joel Becker 提交于
      Now that ocfs2 limits inode numbers to 32bits, add a mount option to
      disable the limit.  This parallels XFS.  64bit systems can handle the
      larger inode numbers.
      
      [ Added description of inode64 mount option in ocfs2.txt. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      12462f1d
    • T
      ocfs2: Add incompatible flag for extended attribute · 8154da3d
      Tiger Yang 提交于
      This patch adds the s_incompat flag for extended attribute support. This
      helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
      to mount a volume with xattr support.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8154da3d
    • T
      ocfs2: Add extended attribute support · cf1d6c76
      Tiger Yang 提交于
      This patch implements storing extended attributes both in inode or a single
      external block. We only store EA's in-inode when blocksize > 512 or that
      inode block has free space for it. When an EA's value is larger than 80
      bytes, we will store the value via b-tree outside inode or block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf1d6c76
    • T
      ocfs2: reserve inline space for extended attribute · fdd77704
      Tiger Yang 提交于
      Add the structures and helper functions we want for handling inline extended
      attributes. We also update the inline-data handlers so that they properly
      function in the event that we have both inline data and inline attributes
      sharing an inode block.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      fdd77704
    • M
      ocfs2: throttle back local alloc when low on disk space · 9c7af40b
      Mark Fasheh 提交于
      Ocfs2's local allocator disables itself for the duration of a mount point
      when it has trouble allocating a large enough area from the primary bitmap.
      That can cause performance problems, especially for disks which were only
      temporarily full or fragmented. This patch allows for the allocator to
      shrink it's window first, before being disabled. Later, it can also be
      re-enabled so that any performance drop is minimized.
      
      To do this, we allow the value of osb->local_alloc_bits to be shrunk when
      needed. The default value is recorded in a mostly read-only variable so that
      we can re-initialize when required.
      
      Locking had to be updated so that we could protect changes to
      local_alloc_bits. Mostly this involves protecting various local alloc values
      with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
      is used when the local allocator is has shrunk, but is not disabled. If the
      available space dips below 1 megabyte, the local alloc file is disabled. In
      either case, local alloc is re-enabled 30 seconds after the event, or when
      an appropriate amount of bits is seen in the primary bitmap.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9c7af40b
    • M
      ocfs2: Track local alloc bits internally · ebcee4b5
      Mark Fasheh 提交于
      Do this instead of tracking absolute local alloc size. This avoids
      needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
      the value is now in a more natural unit for internal file system bitmap
      work.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ebcee4b5
    • S
      vfs: Use const for kernel parser table · a447c093
      Steven Whitehouse 提交于
      This is a much better version of a previous patch to make the parser
      tables constant. Rather than changing the typedef, we put the "const" in
      all the various places where its required, allowing the __initconst
      exception for nfsroot which was the cause of the previous trouble.
      
      This was posted for review some time ago and I believe its been in -mm
      since then.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Alexander Viro <aviro@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a447c093
  7. 01 8月, 2008 1 次提交
    • S
      [PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264
      Sunil Mushran 提交于
      As the fs recovery is asynchronous, there is a small chance that another
      node can mount (and thus recover) the slot before the recovery thread
      gets to it.
      
      If this happens, the recovery thread will block indefinitely on the
      journal/slot lock as that lock will be held for the duration of the mount
      (by design) by the node assigned to that slot.
      
      The solution implemented is to keep track of the journal replays using
      a recovery generation in the journal inode, which will be incremented by the
      thread replaying that journal. The recovery thread, before attempting the
      blocking lock on the journal/slot lock, will compare the generation on disk
      with what it has cached and skip recovery if it does not match.
      
      This bug appears to have been inadvertently introduced during the mount/umount
      vote removal by mainline commit 34d024f8. In the
      mount voting scheme, the messaging would indirectly indicate that the slot
      was being recovered.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      539d8264
  8. 27 7月, 2008 1 次提交
  9. 15 7月, 2008 1 次提交
  10. 18 4月, 2008 11 次提交
    • T
      ocfs2: Add inode stealing for ocfs2_reserve_new_inode · 4d0ddb2c
      Tao Ma 提交于
      Inode allocation is modified to look in other nodes allocators during
      extreme out of space situations. We retry our own slot when space is freed
      back to the global bitmap, or whenever we've allocated more than 1024 inodes
      from another slot.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4d0ddb2c
    • J
      ocfs2: Add the USERSPACE_STACK incompat bit. · b61817e1
      Joel Becker 提交于
      The filesystem gains the USERSPACE_STACK incomat bit and the
      s_cluster_info field on the superblock.  When a userspace stack is in
      use, the name of the stack is stored on-disk for mount-time
      verification.
      
      The "cluster_stack" option is added to mount(2) processing.  The mount
      process needs to pass the matching stack name.  If the passed name and
      the on-disk name do not match, the mount is failed.
      
      When using the classic o2cb stack, the incompat bit is *not* set and no
      mount option is used other than the usual heartbeat=local.  Thus, the
      filesystem is compatible with older tools.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b61817e1
    • J
      ocfs2: Break out stackglue into modules. · 286eaa95
      Joel Becker 提交于
      We define the ocfs2_stack_plugin structure to represent a stack driver.
      The o2cb stack code is split into stack_o2cb.c.  This becomes the
      ocfs2_stack_o2cb.ko module.
      
      The stackglue generic functions are similarly split into the
      ocfs2_stackglue.ko module.  This module now provides an interface to
      register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
      part of this interface, ocfs2_stackglue can load drivers on demand.
      This is accomplished in ocfs2_cluster_connect().
      
      ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
      If a hangup is pending, it will not release the driver module and will
      let _hangup() do that.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      286eaa95
    • J
      ocfs2: Clean up stackglue initialization · 63e0c48a
      Joel Becker 提交于
      The stack glue initialization function needs a better name so that it can be
      used cleanly when stackglue becomes a module.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      63e0c48a
    • M
      ocfs2: Fill node number during cluster stack init · 0abd6d18
      Mark Fasheh 提交于
      It doesn't make sense to query for a node number before connecting to the
      cluster stack. This should be safe to do because node_num is only just
      printed,
      and we're actually only moving the setting of node num a small amount
      further in the mount process.
      
      [ Disconnect when node query fails -- Joel ]
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0abd6d18
    • J
      ocfs2: Move o2hb functionality into the stack glue. · 6953b4c0
      Joel Becker 提交于
      The last bit of classic stack used directly in ocfs2 code is o2hb.
      Specifically, the check for heartbeat during mount and the call to
      ocfs2_hb_ctl during unmount.
      
      We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
      to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.
      
      The check for heartbeat is moved into ocfs2_cluster_connect().  It will
      be matched by a similar check for other stacks.
      
      With this change, only stackglue.c includes cluster/ headers.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      6953b4c0
    • J
      ocfs2: Abstract out node number queries. · 19fdb624
      Joel Becker 提交于
      ocfs2 asks the cluster stack for the local node's node number for two
      reasons; to fill the slot map and to print it. While the slot map isn't
      necessary for userspace cluster stacks, the printing is very nice for
      debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
      this value. It is anticipated that the slot map will not be used under a
      userspace cluster stack, so validity checks of the node num only need to
      exist in the slot map code. Otherwise, it just gets used and printed as an
      opaque value.
      
      [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
        truly opaque. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      19fdb624
    • J
      ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. · 4670c46d
      Joel Becker 提交于
      This step introduces a cluster stack agnostic API for initializing and
      exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
      connect to the stack.  It is all handled in stackglue.c.
      
      heartbeat.c no longer needs to know how it gets called.
      ocfs2_do_node_down() is now a clean recovery trigger.
      
      The big gotcha is the ordering of initializations and de-initializations done
      underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
      o2dlm initialization in one block.  Thus, the o2dlm functionality of
      ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
      however, did a few things between de-registration of the eviction
      callback and actually shutting down the domain.  Now de-registration and
      shutdown of the domain are wrapped within the single
      ocfs2_cluster_disconnect() call.  I've checked the code paths to make
      sure we can safely tear down things in ocfs2_dlm_shutdown() before
      calling ocfs2_cluster_disconnect().  The filesystem has already set
      itself to ignore the callback.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4670c46d
    • J
      ocfs2: Separate out dlm lock functions. · 24ef1815
      Joel Becker 提交于
      This is the first in a series of patches to isolate ocfs2 from the
      underlying cluster stack. Here we wrap the dlm locking functions with
      ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
      callbacks, we can eliminate the callbacks from the filesystem visible
      functions.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      24ef1815
    • J
      ocfs2: Change the recovery map to an array of node numbers. · 553abd04
      Joel Becker 提交于
      The old recovery map was a bitmap of node numbers.  This was sufficient
      for the maximum node number of 254.  Going forward, we want node numbers
      to be UINT32.  Thus, we need a new recovery map.
      
      Note that we can't keep track of slots here.  We must write down the
      node number to recovery *before* we get the locks needed to convert a
      node number into a slot number.
      
      The recovery map is now an array of unsigned ints, max_slots in size.
      It moves to journal.c with the rest of recovery.
      
      Because it needs to be initialized, we move all of recovery initialization
      into a new function, ocfs2_recovery_init().  This actually cleans up
      ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
      becomes part of ocfs2_recovery_exit().
      
      A number of node map functions are rendered obsolete and are removed.
      
      Finally, waiting on recovery is wrapped in a function rather than naked
      checks on the recovery_event.  This is a cleanup from Mark.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      553abd04
    • M
      ocfs2: Move slot map access into slot_map.c · 8e8a4603
      Mark Fasheh 提交于
      journal.c and dlmglue.c would refresh the slot map by hand.  Instead, have
      the update and clear functions do the work inside slot_map.c.  The eventual
      result is to make ocfs2_slot_info defined privately in slot_map.c
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8e8a4603
  11. 07 2月, 2008 1 次提交
    • J
      ocfs2: Negotiate locking protocol versions. · d24fbcda
      Joel Becker 提交于
      Currently, when ocfs2 nodes connect via TCP, they advertise their
      compatibility level.  If the versions do not match, two nodes cannot speak
      to each other and they disconnect. As a result, this provides no forward or
      backwards compatibility.
      
      This patch implements a simple protocol negotiation at the dlm level by
      introducing a major/minor version number scheme for entities that
      communicate.  Specifically, o2dlm has a major/minor version for interaction
      with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
      interacting with the filesystem on other nodes.
      
      This will allow rolling upgrades of ocfs2 clusters when changes to the
      locking or network protocols can be done in a backwards compatible manner.
      In those cases, only the minor number is changed and the negotatied protocol
      minor is returned from dlm join. In the far less likely event that a
      required protocol change makes backwards compatibility impossible, we simply
      bump the major number.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d24fbcda
  12. 26 1月, 2008 1 次提交