1. 23 6月, 2009 1 次提交
  2. 04 6月, 2009 1 次提交
    • S
      ocfs2: timer to queue scan of all orphan slots · 83273932
      Srinivas Eeda 提交于
      When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
      before moving the dentry to the orphan directory. Other nodes that have
      this dentry in cache have a PR on the same dentry lock.  When the EX is
      requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
      during downconvert.  The inode is finally deleted when the last node to iput
      the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.
      
      A problem arises if a node is forced to free dentry locks because of memory
      pressure. If this happens, the node will no longer get downconvert
      notifications for the dentries that have been unlinked on another node.
      If it also happens that node is actively using the corresponding inode and
      happens to be the one performing the last iput on that inode, it will fail
      to delete the inode as it will not have the MAYBE_ORPHANED flag set.
      
      This patch fixes this shortcoming by introducing a periodic scan of the
      orphan directories to delete such inodes. Care has been taken to distribute
      the workload across the cluster so that no one node has to perform the task
      all the time.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83273932
  3. 04 4月, 2009 1 次提交
    • W
      ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8
      wengang wang 提交于
      For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
      ocfs2_get_dentry() may read from disk when the inode is not in memory,
      without any cross cluster lock. this leads to the file system loading a
      stale inode.
      
      This patch fixes above problem.
      
      Solution is that in case of inode is not in memory, we get the cluster
      lock(PR) of alloc inode where the inode in question is allocated from (this
      causes node on which deletion is done sync the alloc inode) before reading
      out the inode itsself. then we check the bitmap in the group (the inode in
      question allcated from) to see if the bit is clear. if it's clear then it's
      stale. if the bit is set, we then check generation as the existing code
      does.
      
      We have to read out the inode in question from disk first to know its alloc
      slot and allot bit. And if its not stale we read it out using ocfs2_iget().
      The second read should then be from cache.
      
      And also we have to add a per superblock nfs_sync_lock to cover the lock for
      alloc inode and that for inode in question. this is because ocfs2_get_dentry()
      and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
      in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
      that mutliple ocfs2_delete_inode() can run concurrently in normal case.
      
      [mfasheh@suse.com: build warning fixes and comment cleanups]
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      6ca497a8
  4. 06 1月, 2009 1 次提交
    • J
      ocfs2: Implementation of local and global quota file handling · 9e33d69f
      Jan Kara 提交于
      For each quota type each node has local quota file. In this file it stores
      changes users have made to disk usage via this node. Once in a while this
      information is synced to global file (and thus with other nodes) so that
      limits enforcement at least aproximately works.
      
      Global quota files contain all the information about usage and limits. It's
      mostly handled by the generic VFS code (which implements a trie of structures
      inside a quota file). We only have to provide functions to convert structures
      from on-disk format to in-memory one. We also have to provide wrappers for
      various quota functions starting transactions and acquiring necessary cluster
      locks before the actual IO is really started.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9e33d69f
  5. 18 4月, 2008 4 次提交
    • J
      ocfs2: Break out stackglue into modules. · 286eaa95
      Joel Becker 提交于
      We define the ocfs2_stack_plugin structure to represent a stack driver.
      The o2cb stack code is split into stack_o2cb.c.  This becomes the
      ocfs2_stack_o2cb.ko module.
      
      The stackglue generic functions are similarly split into the
      ocfs2_stackglue.ko module.  This module now provides an interface to
      register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
      part of this interface, ocfs2_stackglue can load drivers on demand.
      This is accomplished in ocfs2_cluster_connect().
      
      ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
      If a hangup is pending, it will not release the driver module and will
      let _hangup() do that.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      286eaa95
    • J
      ocfs2: Clean up stackglue initialization · 63e0c48a
      Joel Becker 提交于
      The stack glue initialization function needs a better name so that it can be
      used cleanly when stackglue becomes a module.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      63e0c48a
    • J
      ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. · 4670c46d
      Joel Becker 提交于
      This step introduces a cluster stack agnostic API for initializing and
      exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
      connect to the stack.  It is all handled in stackglue.c.
      
      heartbeat.c no longer needs to know how it gets called.
      ocfs2_do_node_down() is now a clean recovery trigger.
      
      The big gotcha is the ordering of initializations and de-initializations done
      underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
      o2dlm initialization in one block.  Thus, the o2dlm functionality of
      ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
      however, did a few things between de-registration of the eviction
      callback and actually shutting down the domain.  Now de-registration and
      shutdown of the domain are wrapped within the single
      ocfs2_cluster_disconnect() call.  I've checked the code paths to make
      sure we can safely tear down things in ocfs2_dlm_shutdown() before
      calling ocfs2_cluster_disconnect().  The filesystem has already set
      itself to ignore the callback.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4670c46d
    • J
      ocfs2: Separate out dlm lock functions. · 24ef1815
      Joel Becker 提交于
      This is the first in a series of patches to isolate ocfs2 from the
      underlying cluster stack. Here we wrap the dlm locking functions with
      ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
      callbacks, we can eliminate the callbacks from the filesystem visible
      functions.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      24ef1815
  6. 04 3月, 2008 1 次提交
  7. 07 2月, 2008 1 次提交
    • J
      ocfs2: Negotiate locking protocol versions. · d24fbcda
      Joel Becker 提交于
      Currently, when ocfs2 nodes connect via TCP, they advertise their
      compatibility level.  If the versions do not match, two nodes cannot speak
      to each other and they disconnect. As a result, this provides no forward or
      backwards compatibility.
      
      This patch implements a simple protocol negotiation at the dlm level by
      introducing a major/minor version number scheme for entities that
      communicate.  Specifically, o2dlm has a major/minor version for interaction
      with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
      interacting with the filesystem on other nodes.
      
      This will allow rolling upgrades of ocfs2 clusters when changes to the
      locking or network protocols can be done in a backwards compatible manner.
      In those cases, only the minor number is changed and the negotatied protocol
      minor is returned from dlm join. In the far less likely event that a
      required protocol change makes backwards compatibility impossible, we simply
      bump the major number.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d24fbcda
  8. 26 1月, 2008 4 次提交
    • M
      [PATCH 1/2] ocfs2: add flock lock type · cf8e06f1
      Mark Fasheh 提交于
      This adds a new dlmglue lock type which is intended to back flock()
      requests.
      
      Since these locks are driven from userspace, usage rules are much more
      liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
      make use of most dlmglue features - lock caching and lock level
      optimizations in particular. Additionally, userspace is free to deadlock
      itself, so we have to deal with that in the same way as the rest of the
      kernel - by allowing a signal to abort a lock request.
      
      In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
      does it's own dlm coordination. We still use the same helper functions
      though, so duplicated code is kept to a minimum.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      cf8e06f1
    • M
      ocfs2: Rename ocfs2_meta_[un]lock · e63aecb6
      Mark Fasheh 提交于
      Call this the "inode_lock" now, since it covers both data and meta data.
      This patch makes no functional changes.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      e63aecb6
    • M
      ocfs2: Remove data locks · c934a92d
      Mark Fasheh 提交于
      The meta lock now covers both meta data and data, so this just removes the
      now-redundant data lock.
      
      Combining locks saves us a round of lock mastery per inode and one less lock
      to ping between nodes during read/write.
      
      We don't lose much - since meta locks were always held before a data lock
      (and at the same level) ordered writeout mode (the default) ensured that
      flushing for the meta data lock also pushed out data anyways.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      c934a92d
    • M
      ocfs2: Remove mount/unmount votes · 34d024f8
      Mark Fasheh 提交于
      The node maps that are set/unset by these votes are no longer relevant, thus
      we can remove the mount and umount votes. Since those are the last two
      remaining votes, we can also remove the entire vote infrastructure.
      
      The vote thread has been renamed to the downconvert thread, and the small
      amount of functionality related to managing it has been moved into
      fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      34d024f8
  9. 13 10月, 2007 1 次提交
  10. 03 5月, 2007 1 次提交
  11. 27 4月, 2007 1 次提交
    • T
      ocfs2: Remove delete inode vote · 50008630
      Tiger Yang 提交于
      Ocfs2 currently does cluster-wide node messaging to check the open state of
      an inode during delete. This patch removes that mechanism in favor of an
      inode cluster lock which is taken at shared read when an inode is first read
      and dropped in clear_inode(). This allows a deleting node to test the
      liveness of an inode by attempting to take an exclusive lock.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      50008630
  12. 02 12月, 2006 3 次提交
  13. 25 9月, 2006 4 次提交
    • M
      ocfs2: Remove i_generation from inode lock names · 24c19ef4
      Mark Fasheh 提交于
      OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
      Typically, i_generation is encoded in the lock name so that a deleted inode
      on and a new one in the same block don't share the same lvb.
      
      Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
      is potentially thrown away as soon as the meta data lock is taken - we
      cannot encode the lock name without first knowing i_generation, which
      requires a disk read.
      
      This patch encodes i_generation in the inode meta data lvb, and removes the
      value from the inode meta data lock name. This way, the read can be covered
      by a lock, and at the same time we can distinguish between an up to date and
      a stale LVB.
      
      This will help cold-cache stat(2) performance in particular.
      
      Since this patch changes the protocol version, we take the opportunity to do
      a minor re-organization of two of the LVB fields.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      24c19ef4
    • M
      ocfs2: Encode i_generation in the meta data lvb · f9e2d82e
      Mark Fasheh 提交于
      When i_generation is removed from the lockname, this will help us determine
      whether a meta data lvb has information that is in sync with the local
      struct inode.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      f9e2d82e
    • M
      ocfs2: Free up some space in the lvb · 4d3b83f7
      Mark Fasheh 提交于
      lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to
      free up some space. This should be backwards compatible until we use one of
      the fields, in which case we'd bump the lvb version anyway.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      4d3b83f7
    • M
      ocfs2: Add new cluster lock type · d680efe9
      Mark Fasheh 提交于
      Replace the dentry vote mechanism with a cluster lock which covers a set
      of dentries. This allows us to force d_delete() only on nodes which actually
      care about an unlink.
      
      Every node that does a ->lookup() gets a read only lock on the dentry, until
      an unlink during which the unlinking node, will request an exclusive lock,
      forcing the other nodes who care about that dentry to d_delete() it. The
      effect is that we retain a very lightweight ->d_revalidate(), and at the
      same time get to make large improvements to the average case performance of
      the ocfs2 unlink and rename operations.
      
      This patch adds the cluster lock type which OCFS2 can attach to
      dentries.  A small number of fs/ocfs2/dcache.c functions are stubbed
      out so that this change can compile.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d680efe9
  14. 21 9月, 2006 1 次提交
  15. 04 1月, 2006 1 次提交