1. 18 4月, 2008 10 次提交
    • J
      ocfs2: Add the USERSPACE_STACK incompat bit. · b61817e1
      Joel Becker 提交于
      The filesystem gains the USERSPACE_STACK incomat bit and the
      s_cluster_info field on the superblock.  When a userspace stack is in
      use, the name of the stack is stored on-disk for mount-time
      verification.
      
      The "cluster_stack" option is added to mount(2) processing.  The mount
      process needs to pass the matching stack name.  If the passed name and
      the on-disk name do not match, the mount is failed.
      
      When using the classic o2cb stack, the incompat bit is *not* set and no
      mount option is used other than the usual heartbeat=local.  Thus, the
      filesystem is compatible with older tools.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      b61817e1
    • J
      ocfs2: Break out stackglue into modules. · 286eaa95
      Joel Becker 提交于
      We define the ocfs2_stack_plugin structure to represent a stack driver.
      The o2cb stack code is split into stack_o2cb.c.  This becomes the
      ocfs2_stack_o2cb.ko module.
      
      The stackglue generic functions are similarly split into the
      ocfs2_stackglue.ko module.  This module now provides an interface to
      register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
      part of this interface, ocfs2_stackglue can load drivers on demand.
      This is accomplished in ocfs2_cluster_connect().
      
      ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
      If a hangup is pending, it will not release the driver module and will
      let _hangup() do that.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      286eaa95
    • J
      ocfs2: Clean up stackglue initialization · 63e0c48a
      Joel Becker 提交于
      The stack glue initialization function needs a better name so that it can be
      used cleanly when stackglue becomes a module.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      63e0c48a
    • M
      ocfs2: Fill node number during cluster stack init · 0abd6d18
      Mark Fasheh 提交于
      It doesn't make sense to query for a node number before connecting to the
      cluster stack. This should be safe to do because node_num is only just
      printed,
      and we're actually only moving the setting of node num a small amount
      further in the mount process.
      
      [ Disconnect when node query fails -- Joel ]
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0abd6d18
    • J
      ocfs2: Move o2hb functionality into the stack glue. · 6953b4c0
      Joel Becker 提交于
      The last bit of classic stack used directly in ocfs2 code is o2hb.
      Specifically, the check for heartbeat during mount and the call to
      ocfs2_hb_ctl during unmount.
      
      We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
      to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.
      
      The check for heartbeat is moved into ocfs2_cluster_connect().  It will
      be matched by a similar check for other stacks.
      
      With this change, only stackglue.c includes cluster/ headers.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      6953b4c0
    • J
      ocfs2: Abstract out node number queries. · 19fdb624
      Joel Becker 提交于
      ocfs2 asks the cluster stack for the local node's node number for two
      reasons; to fill the slot map and to print it. While the slot map isn't
      necessary for userspace cluster stacks, the printing is very nice for
      debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
      this value. It is anticipated that the slot map will not be used under a
      userspace cluster stack, so validity checks of the node num only need to
      exist in the slot map code. Otherwise, it just gets used and printed as an
      opaque value.
      
      [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
        truly opaque. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      19fdb624
    • J
      ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. · 4670c46d
      Joel Becker 提交于
      This step introduces a cluster stack agnostic API for initializing and
      exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
      connect to the stack.  It is all handled in stackglue.c.
      
      heartbeat.c no longer needs to know how it gets called.
      ocfs2_do_node_down() is now a clean recovery trigger.
      
      The big gotcha is the ordering of initializations and de-initializations done
      underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
      o2dlm initialization in one block.  Thus, the o2dlm functionality of
      ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
      however, did a few things between de-registration of the eviction
      callback and actually shutting down the domain.  Now de-registration and
      shutdown of the domain are wrapped within the single
      ocfs2_cluster_disconnect() call.  I've checked the code paths to make
      sure we can safely tear down things in ocfs2_dlm_shutdown() before
      calling ocfs2_cluster_disconnect().  The filesystem has already set
      itself to ignore the callback.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4670c46d
    • J
      ocfs2: Separate out dlm lock functions. · 24ef1815
      Joel Becker 提交于
      This is the first in a series of patches to isolate ocfs2 from the
      underlying cluster stack. Here we wrap the dlm locking functions with
      ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
      callbacks, we can eliminate the callbacks from the filesystem visible
      functions.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      24ef1815
    • J
      ocfs2: Change the recovery map to an array of node numbers. · 553abd04
      Joel Becker 提交于
      The old recovery map was a bitmap of node numbers.  This was sufficient
      for the maximum node number of 254.  Going forward, we want node numbers
      to be UINT32.  Thus, we need a new recovery map.
      
      Note that we can't keep track of slots here.  We must write down the
      node number to recovery *before* we get the locks needed to convert a
      node number into a slot number.
      
      The recovery map is now an array of unsigned ints, max_slots in size.
      It moves to journal.c with the rest of recovery.
      
      Because it needs to be initialized, we move all of recovery initialization
      into a new function, ocfs2_recovery_init().  This actually cleans up
      ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
      becomes part of ocfs2_recovery_exit().
      
      A number of node map functions are rendered obsolete and are removed.
      
      Finally, waiting on recovery is wrapped in a function rather than naked
      checks on the recovery_event.  This is a cleanup from Mark.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      553abd04
    • M
      ocfs2: Move slot map access into slot_map.c · 8e8a4603
      Mark Fasheh 提交于
      journal.c and dlmglue.c would refresh the slot map by hand.  Instead, have
      the update and clear functions do the work inside slot_map.c.  The eventual
      result is to make ocfs2_slot_info defined privately in slot_map.c
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8e8a4603
  2. 07 2月, 2008 1 次提交
    • J
      ocfs2: Negotiate locking protocol versions. · d24fbcda
      Joel Becker 提交于
      Currently, when ocfs2 nodes connect via TCP, they advertise their
      compatibility level.  If the versions do not match, two nodes cannot speak
      to each other and they disconnect. As a result, this provides no forward or
      backwards compatibility.
      
      This patch implements a simple protocol negotiation at the dlm level by
      introducing a major/minor version number scheme for entities that
      communicate.  Specifically, o2dlm has a major/minor version for interaction
      with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
      interacting with the filesystem on other nodes.
      
      This will allow rolling upgrades of ocfs2 clusters when changes to the
      locking or network protocols can be done in a backwards compatible manner.
      In those cases, only the minor number is changed and the negotatied protocol
      minor is returned from dlm join. In the far less likely event that a
      required protocol change makes backwards compatibility impossible, we simply
      bump the major number.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d24fbcda
  3. 26 1月, 2008 9 次提交
    • J
      ocfs2: Silence false lockdep warnings · 5fa0613e
      Jan Kara 提交于
      Create separate lockdep lock classes for system file's i_mutexes. They are
      used to guard allocations and similar things and thus rank differently
      than i_mutex of a regular file or directory.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      5fa0613e
    • M
      [PATCH 2/2] ocfs2: cluster aware flock() · 53fc622b
      Mark Fasheh 提交于
      Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new
      mount option, "localflocks" is added so that users can revert to old
      functionality as need be.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      53fc622b
    • S
      ocfs2: Local alloc window size changeable via mount option · 2fbe8d1e
      Sunil Mushran 提交于
      Local alloc is a performance optimization in ocfs2 in which a node
      takes a window of bits from the global bitmap and then uses that for
      all small local allocations. This window size is fixed to 8MB currently.
      This patch allows users to specify the window size in MB including
      disabling it by passing in 0. If the number specified is too large,
      the fs will use the default value of 8MB.
      
      mount -o localalloc=X /dev/sdX /mntpoint
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      2fbe8d1e
    • M
      ocfs2: Support commit= mount option · d147b3d6
      Mark Fasheh 提交于
      Mostly taken from ext3. This allows the user to set the jbd commit interval,
      in seconds. The default of 5 seconds stays the same, but now users can
      easily increase the commit interval. Typically, this would be increased in
      order to benefit performance at the expense of data-safety.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d147b3d6
    • T
      ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum. · e9d578a8
      Tao Ma 提交于
      This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there
      is only 1 group, it will be equal to the total clusters in the volume. So
      as for online resize, it should change for all the nodes in the cluster.
      It isn't easy and there is no corresponding lock for it.
      
      bitmap_cpg is only used in 2 areas:
      1. Check whether the suballoc is too large for us to allocate from the global
         bitmap, so it is little used. And now the suballoc size is 2048, it rarely
         meet this situation and the check is almost useless.
      2. Calculate which group a cluster belongs to. We use it during truncate to
         figure out which cluster group an extent belongs too. But we should be OK
         if we increase it though as the cluster group calculated shouldn't change
         and we only ever have a small bitmap_cpg on file systems with a single
         cluster group.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      e9d578a8
    • M
      ocfs2: Rename ocfs2_meta_[un]lock · e63aecb6
      Mark Fasheh 提交于
      Call this the "inode_lock" now, since it covers both data and meta data.
      This patch makes no functional changes.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      e63aecb6
    • M
      ocfs2: Remove data locks · c934a92d
      Mark Fasheh 提交于
      The meta lock now covers both meta data and data, so this just removes the
      now-redundant data lock.
      
      Combining locks saves us a round of lock mastery per inode and one less lock
      to ping between nodes during read/write.
      
      We don't lose much - since meta locks were always held before a data lock
      (and at the same level) ordered writeout mode (the default) ensured that
      flushing for the meta data lock also pushed out data anyways.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      c934a92d
    • M
      ocfs2: Remove mount/unmount votes · 34d024f8
      Mark Fasheh 提交于
      The node maps that are set/unset by these votes are no longer relevant, thus
      we can remove the mount and umount votes. Since those are the last two
      remaining votes, we can also remove the entire vote infrastructure.
      
      The vote thread has been renamed to the downconvert thread, and the small
      amount of functionality related to managing it has been moved into
      fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      34d024f8
    • M
      ocfs2: Remove fs dependency on ocfs2_heartbeat module · 6f7b056e
      Mark Fasheh 提交于
      Now that the dlm exposes domain information to us, we don't need generic
      node up / node down callbacks. And since the DLM is only telling us when a
      node goes down unexpectedly, we no longer need to optimize away node down
      callbacks via the umount map.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      6f7b056e
  4. 28 11月, 2007 1 次提交
  5. 17 10月, 2007 1 次提交
  6. 13 10月, 2007 3 次提交
  7. 12 9月, 2007 1 次提交
    • T
      [PATCH] ocfs2: fix mount option parsing · c0123ade
      Tiger Yang 提交于
      For some mount option types, ocfs2_parse_options() will try to access
      sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that
      hasn't been allocated yet and will cause a kernel crash.
      
      Fix this by storing options in a struct which can then get pushed into the
      ocfs2_super once it's been allocated later. If we need more options which
      store to the ocfs2_super in the future, we can just fields to this struct.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      c0123ade
  8. 10 8月, 2007 3 次提交
  9. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  10. 11 7月, 2007 2 次提交
  11. 17 5月, 2007 1 次提交
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  12. 08 5月, 2007 1 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
  13. 03 5月, 2007 1 次提交
  14. 27 4月, 2007 3 次提交
    • M
      ocfs2: Cache extent records · 83418978
      Mark Fasheh 提交于
      The extent map code was ripped out earlier because of an inability to deal
      with holes. This patch adds back a simpler caching scheme requiring far less
      code.
      
      Our old extent map caching was designed back when meta data block caching in
      Ocfs2 didn't work very well, resulting in many disk reads. These days our
      metadata caching is much better, resulting in no un-necessary disk reads. As
      a result, extent caching doesn't have to be as fancy, nor does it have to
      cache as many extents. Keeping the last 3 extents seen should be sufficient
      to give us a small performance boost on some streaming workloads.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      83418978
    • M
      ocfs2: temporarily remove extent map caching · 363041a5
      Mark Fasheh 提交于
      The code in extent_map.c is not prepared to deal with a subtree being
      rotated between lookups. This can happen when filling holes in sparse files.
      Instead of a lengthy patch to update the code (which would likely lose the
      benefit of caching subtree roots), we remove most of the algorithms and
      implement a simple path based lookup. A less ambitious extent caching scheme
      will be added in a later patch.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      363041a5
    • T
      ocfs2: Remove delete inode vote · 50008630
      Tiger Yang 提交于
      Ocfs2 currently does cluster-wide node messaging to check the open state of
      an inode during delete. This patch removes that mechanism in favor of an
      inode cluster lock which is taken at shared read when an inode is first read
      and dropped in clear_inode(). This allows a deleting node to test the
      liveness of an inode by attempting to take an exclusive lock.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      50008630
  15. 13 2月, 2007 1 次提交
  16. 14 12月, 2006 1 次提交