1. 27 2月, 2010 2 次提交
    • J
      ocfs2: Attach the connection to the lksb · c0e41338
      Joel Becker 提交于
      We're going to want it in the ast functions, so we convert union
      ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      c0e41338
    • J
      ocfs2: Pass lksbs back from stackglue ast/bast functions. · a796d286
      Joel Becker 提交于
      The stackglue ast and bast functions tried to maintain the fiction that
      their arguments were void pointers.  In reality, stack_user.c had to
      know that the argument was an ocfs2_lock_res in order to get the status
      off of the lksb.  That's ugly.
      
      This changes stackglue to always pass the lksb as the argument to ast
      and bast functions.  The caller can always use container_of() to get the
      ocfs2_lock_res or user_dlm_lock_res.  The net effect to the caller is
      zero.  They still get back the lockres in their ast.  stackglue gets
      cleaner, and now can use the lksb itself.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      a796d286
  2. 04 2月, 2010 1 次提交
  3. 03 2月, 2010 4 次提交
  4. 26 1月, 2010 1 次提交
  5. 04 12月, 2009 1 次提交
  6. 23 9月, 2009 3 次提交
  7. 05 9月, 2009 2 次提交
  8. 23 6月, 2009 4 次提交
  9. 04 6月, 2009 1 次提交
    • S
      ocfs2: timer to queue scan of all orphan slots · 83273932
      Srinivas Eeda 提交于
      When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
      before moving the dentry to the orphan directory. Other nodes that have
      this dentry in cache have a PR on the same dentry lock.  When the EX is
      requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
      during downconvert.  The inode is finally deleted when the last node to iput
      the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.
      
      A problem arises if a node is forced to free dentry locks because of memory
      pressure. If this happens, the node will no longer get downconvert
      notifications for the dentries that have been unlinked on another node.
      If it also happens that node is actively using the corresponding inode and
      happens to be the one performing the last iput on that inode, it will fail
      to delete the inode as it will not have the MAYBE_ORPHANED flag set.
      
      This patch fixes this shortcoming by introducing a periodic scan of the
      orphan directories to delete such inodes. Care has been taken to distribute
      the workload across the cluster so that no one node has to perform the task
      all the time.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83273932
  10. 04 4月, 2009 1 次提交
    • W
      ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8
      wengang wang 提交于
      For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
      ocfs2_get_dentry() may read from disk when the inode is not in memory,
      without any cross cluster lock. this leads to the file system loading a
      stale inode.
      
      This patch fixes above problem.
      
      Solution is that in case of inode is not in memory, we get the cluster
      lock(PR) of alloc inode where the inode in question is allocated from (this
      causes node on which deletion is done sync the alloc inode) before reading
      out the inode itsself. then we check the bitmap in the group (the inode in
      question allcated from) to see if the bit is clear. if it's clear then it's
      stale. if the bit is set, we then check generation as the existing code
      does.
      
      We have to read out the inode in question from disk first to know its alloc
      slot and allot bit. And if its not stale we read it out using ocfs2_iget().
      The second read should then be from cache.
      
      And also we have to add a per superblock nfs_sync_lock to cover the lock for
      alloc inode and that for inode in question. this is because ocfs2_get_dentry()
      and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
      in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
      that mutliple ocfs2_delete_inode() can run concurrently in normal case.
      
      [mfasheh@suse.com: build warning fixes and comment cleanups]
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      6ca497a8
  11. 27 2月, 2009 1 次提交
  12. 03 2月, 2009 1 次提交
  13. 09 1月, 2009 1 次提交
  14. 06 1月, 2009 4 次提交
    • M
      ocfs2: remove unneeded lvb casts · a641dc2a
      Mark Fasheh 提交于
      dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb().
      This is pointless however, as ocfs2_dlm_lvb() returns void *.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a641dc2a
    • J
      ocfs2: Fix ocfs2_read_quota_block() error handling. · 85eb8b73
      Joel Becker 提交于
      ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to
      match ocfs2_read_blocks().  The quota code, converting from
      ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in
      ocfs2_read_quota_block().  Unfortunately, the prototype of
      ocfs2_read_quota_block() matches the old prototype of ocfs2_bread().
      
      The problem is that ocfs2_bread() returned the buffer head, and callers
      assumed that a NULL pointer was indicative of error.  It wasn't.  This
      is why ocfs2_bread() took an int*err argument as well.
      
      The new prototype of ocfs2_read_virt_blocks() avoids this error handling
      confusion.  Let's change ocfs2_read_quota_block() to match.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      85eb8b73
    • J
      ocfs2: Implementation of local and global quota file handling · 9e33d69f
      Jan Kara 提交于
      For each quota type each node has local quota file. In this file it stores
      changes users have made to disk usage via this node. Once in a while this
      information is synced to global file (and thus with other nodes) so that
      limits enforcement at least aproximately works.
      
      Global quota files contain all the information about usage and limits. It's
      mostly handled by the generic VFS code (which implements a trie of structures
      inside a quota file). We only have to provide functions to convert structures
      from on-disk format to in-memory one. We also have to provide wrappers for
      various quota functions starting transactions and acquiring necessary cluster
      locks before the actual IO is really started.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9e33d69f
    • J
      ocfs2: Wrap inode block reads in a dedicated function. · b657c95c
      Joel Becker 提交于
      The ocfs2 code currently reads inodes off disk with a simple
      ocfs2_read_block() call.  Each place that does this has a different set
      of sanity checks it performs.  Some check only the signature.  A couple
      validate the block number (the block read vs di->i_blkno).  A couple
      others check for VALID_FL.  Only one place validates i_fs_generation.  A
      couple check nothing.  Even when an error is found, they don't all do
      the same thing.
      
      We wrap inode reading into ocfs2_read_inode_block().  This will validate
      all the above fields, going readonly if they are invalid (they never
      should be).  ocfs2_read_inode_block_full() is provided for the places
      that want to pass read_block flags.  Every caller is passing a struct
      inode with a valid ip_blkno, so we don't need a separate blkno argument
      either.
      
      We will remove the validation checks from the rest of the code in a
      later commit, as they are no longer necessary.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b657c95c
  15. 02 12月, 2008 1 次提交
  16. 15 10月, 2008 2 次提交
  17. 15 7月, 2008 2 次提交
    • R
      ocfs2: fix printk format warnings with OCFS2_FS_STATS=n · dd25e55e
      Randy Dunlap 提交于
      Fix printk format warnings when OCFS2_FS_STATS=n:
      
      linux-next-20080528/fs/ocfs2/dlmglue.c: In function 'ocfs2_dlm_seq_show':
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'int'
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'int'
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'int'
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'int'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      dd25e55e
    • S
      [PATCH 2/2] ocfs2: Instrument fs cluster locks · 8ddb7b00
      Sunil Mushran 提交于
      This patch adds code to track the number of times the fs takes
      various cluster locks as well as the times associated with it.
      The information is made available to users via debugfs.
      
      This patch was originally written by Jan Kara <jack@suse.cz>.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8ddb7b00
  18. 11 7月, 2008 1 次提交
    • M
      ocfs2: Fix flags in ocfs2_file_lock · e988cf1c
      Mark Fasheh 提交于
      The stack-glue merge changed the way we use flags in dlmglue in that we now
      use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
      code only partially updated. This took a while to show up though, because
      the lock level constants are actually identical between o2dlm and fs/dlm.
      The *_CONVERT and *_NOQUEUE flags have different values though, which is
      eventually causing a crash in flags_to_o2dlm().
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e988cf1c
  19. 18 4月, 2008 7 次提交
    • J
      ocfs2: Add the 'cluster_stack' sysfs file. · 9c6c877c
      Joel Becker 提交于
      Userspace can now query and specify the cluster stack in use via the
      /sys/fs/ocfs2/cluster_stack file.  By default, it is 'o2cb', which is
      the classic stack.  Thus, old tools that do not know how to modify this
      file will work just fine.  The stack cannot be modified if there is a
      live filesystem.
      
      ocfs2_cluster_connect() now takes the expected cluster stack as an
      argument.  This way, the filesystem and the stack glue ensure they are
      speaking to the same backend.
      
      If the stack is 'o2cb', the o2cb stack plugin is used.  For any other
      value, the fsdlm stack plugin is selected.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9c6c877c
    • J
      ocfs2: Break out stackglue into modules. · 286eaa95
      Joel Becker 提交于
      We define the ocfs2_stack_plugin structure to represent a stack driver.
      The o2cb stack code is split into stack_o2cb.c.  This becomes the
      ocfs2_stack_o2cb.ko module.
      
      The stackglue generic functions are similarly split into the
      ocfs2_stackglue.ko module.  This module now provides an interface to
      register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
      part of this interface, ocfs2_stackglue can load drivers on demand.
      This is accomplished in ocfs2_cluster_connect().
      
      ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
      If a hangup is pending, it will not release the driver module and will
      let _hangup() do that.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      286eaa95
    • J
      ocfs2: Clean up stackglue initialization · 63e0c48a
      Joel Becker 提交于
      The stack glue initialization function needs a better name so that it can be
      used cleanly when stackglue becomes a module.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      63e0c48a
    • J
      ocfs2: Abstract out a debugging function for underlying dlms. · cf0acdcd
      Joel Becker 提交于
      dlmglue.c was still referencing a raw o2dlm lksb in one instance.  Let's
      create a generic ocfs2_dlm_dump_lksb() function.  This allows underlying
      DLMs to print whatever they want about their lock.
      
      We then move the o2dlm dump into stackglue.c where it belongs.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf0acdcd
    • D
      ocfs2: handle async EAGAIN from NOQUEUE request · 1693a5c0
      David Teigland 提交于
      When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE
      requests. Fix up dlmglue to expect this.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1693a5c0
    • J
      ocfs2: Remove CANCELGRANT from the view of dlmglue. · de551246
      Joel Becker 提交于
      o2dlm has the non-standard behavior of providing a cancel callback
      (unlock_ast) even when the cancel has failed (the locking operation
      succeeded without canceling).  This is called CANCELGRANT after the
      status code sent to the callback.  fs/dlm does not provide this
      callback, so dlmglue must be changed to live without it.
      o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.
      
      Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
      needs to check for it.  ocfs2_locking_ast() must catch that a cancel was
      tried and clear the cancel state.
      
      Making these changes opens up a locking race.  dlmglue uses the the
      OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
      one time.  But dlmglue must unlock the lockres before calling into the
      dlm.  In the small window of time between unlocking the lockres and
      calling the dlm, the downconvert thread can try to cancel the lock.  The
      downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
      know that ocfs2_dlm_lock() has not yet been called.
      
      Because ocfs2_dlm_lock() has not yet been called, the cancel operation
      will just be a no-op.  There's nothing to cancel.  With CANCELGRANT,
      dlmglue uses the CANCELGRANT callback to clear up the cancel state.
      When it comes around again, it will retry the cancel.  Eventually, the
      first thread will have called into ocfs2_dlm_lock(), and either the
      lock or the cancel will succeed.  The downconvert thread can then do its
      downconvert.
      
      Without CANCELGRANT, there is nothing to clean up the cancellation
      state.  The downconvert thread does not know to retry its operations.
      More importantly, the original lock may be blocking on the other node
      that is trying to cancel us.  With neither able to make progress, the
      ast is never called and the cancellation state is never cleaned up that
      way.  dlmglue is deadlocked.
      
      The OCFS2_LOCK_PENDING flag is introduced to remedy this window.  It is
      set at the same time OCFS2_LOCK_BUSY is.  Thus, the downconvert thread
      can check whether the lock is cancelable.  If not, it just loops around
      to try again.  Once ocfs2_dlm_lock() is called, the thread then clears
      OCFS2_LOCK_PENDING and wakes the downconvert thread.  Now, if the
      downconvert thread finds the lock BUSY, it can safely try to cancel it.
      Whether the cancel works or not, the state will be properly set and the
      lock processing can continue.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      de551246
    • M
      ocfs2: Fill node number during cluster stack init · 0abd6d18
      Mark Fasheh 提交于
      It doesn't make sense to query for a node number before connecting to the
      cluster stack. This should be safe to do because node_num is only just
      printed,
      and we're actually only moving the setting of node num a small amount
      further in the mount process.
      
      [ Disconnect when node query fails -- Joel ]
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0abd6d18