1. 23 8月, 2008 1 次提交
  2. 31 5月, 2008 1 次提交
  3. 18 4月, 2008 2 次提交
    • S
      ocfs2/net: Add debug interface to o2net · 2309e9e0
      Sunil Mushran 提交于
      This patch exposes o2net information via debugfs. The information includes
      the list of sockets (sock_containers) as well as the list of outstanding
      messages (send_tracking). Useful for o2dlm debugging.
      
      (This patch is derived from an earlier one written by Zach Brown that
      exposed the same information via /proc.)
      
      [Mark: checkpatch fixes]
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2309e9e0
    • T
      ocfs2: Reconnect after idle time out. · 5cc3bf27
      Tao Ma 提交于
      Currently, o2net connects to a node on hb_up and disconnects on
      hb_down and net timeout.
      
      It disconnects on net timeout is ok, but it should attempt to
      reconnect back. This is because sometimes nodes get overloaded
      enough that the network connection breaks but the disk hb does not.
      And if we get into that situation, we either fence (unnecessarily)
      or wait for its disk hb to die (and sometimes hang in the process).
      
      So in this updated scheme, when the network disconnects, we keep
      attempting to reconnect till we succeed or we get a disk hb down
      event.
      
      If the other node is really dead, then we will eventually get a
      node down event. If not, we should be able to connect again and
      continue.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5cc3bf27
  4. 07 2月, 2008 1 次提交
    • J
      ocfs2: Negotiate locking protocol versions. · d24fbcda
      Joel Becker 提交于
      Currently, when ocfs2 nodes connect via TCP, they advertise their
      compatibility level.  If the versions do not match, two nodes cannot speak
      to each other and they disconnect. As a result, this provides no forward or
      backwards compatibility.
      
      This patch implements a simple protocol negotiation at the dlm level by
      introducing a major/minor version number scheme for entities that
      communicate.  Specifically, o2dlm has a major/minor version for interaction
      with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
      interacting with the filesystem on other nodes.
      
      This will allow rolling upgrades of ocfs2 clusters when changes to the
      locking or network protocols can be done in a backwards compatible manner.
      In those cases, only the minor number is changed and the negotatied protocol
      minor is returned from dlm join. In the far less likely event that a
      required protocol change makes backwards compatibility impossible, we simply
      bump the major number.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d24fbcda
  5. 26 1月, 2008 2 次提交
    • M
      ocfs2: Remove data locks · c934a92d
      Mark Fasheh 提交于
      The meta lock now covers both meta data and data, so this just removes the
      now-redundant data lock.
      
      Combining locks saves us a round of lock mastery per inode and one less lock
      to ping between nodes during read/write.
      
      We don't lose much - since meta locks were always held before a data lock
      (and at the same level) ordered writeout mode (the default) ensured that
      flushing for the meta data lock also pushed out data anyways.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      c934a92d
    • M
      ocfs2: Remove mount/unmount votes · 34d024f8
      Mark Fasheh 提交于
      The node maps that are set/unset by these votes are no longer relevant, thus
      we can remove the mount and umount votes. Since those are the last two
      remaining votes, we can also remove the entire vote infrastructure.
      
      The vote thread has been renamed to the downconvert thread, and the small
      amount of functionality related to managing it has been moved into
      fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      34d024f8
  6. 27 4月, 2007 1 次提交
    • T
      ocfs2: Remove delete inode vote · 50008630
      Tiger Yang 提交于
      Ocfs2 currently does cluster-wide node messaging to check the open state of
      an inode during delete. This patch removes that mechanism in favor of an
      inode cluster lock which is taken at shared read when an inode is first read
      and dropped in clear_inode(). This allows a deleting node to test the
      liveness of an inode by attempting to take an exclusive lock.
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      50008630
  7. 08 2月, 2007 4 次提交
    • Z
      ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages · 925037bc
      Zhen Wei 提交于
      When there is a lot of multithreaded I/O usage, two threads can collide
      while sending out a message to the other nodes. This is due to the lack of
      locking between threads while sending out the messages.
      
      When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
      kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
      itself by acquiring a lock at invocation by calling lock_sock().
      tcp_sendmsg() then loops over the buffers in the iovec, allocating
      associated sk_buff's and cache pages for use in the actual send. As it does
      so, it pushes the data out to tcp for actual transmission. However, if one
      of those allocation fails (because a large number of large sends is being
      processed, for example), it must wait for memory to become available. It
      does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
      eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
      contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
      contains the call to release_sock().
      
      The following patch adds a lock to the socket container in order to
      properly serialize outbound requests.
      
      From: Zhen Wei <zwei@novell.com>
      Acked-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      925037bc
    • S
      ocfs2_dlm: disallow a domain join if node maps mismatch · 1faf2894
      Srinivas Eeda 提交于
      There is a small window where a joining node may not see the node(s) that
      just died but are still part of the domain. To fix this, we must disallow
      join requests if the joining node has a different node map.
      
      A new field node_map is added to dlm_query_join_request to send the current
      nodes nodemap along with join request. On the receiving end the nodes that
      are part of the cluster verifies if this new node sees all the nodes that
      are still part of the cluster. They disallow the join if the maps mismatch.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      1faf2894
    • K
      ocfs2: Added post handler callable function in o2net message handler · d74c9803
      Kurt Hackel 提交于
      Currently o2net allows one handler function per message type. This
      patch adds the ability to call another function to be called after
      the handler has returned the message to the other node.
      
      Handlers are now given the option of returning a context (in the form of a
      void **) which will be passed back into the post message handler function.
      Signed-off-by: NKurt Hackel <kurt.hackel@oracle.com>
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      d74c9803
    • K
      ocfs2_dlm: fix cluster-wide refcounting of lock resources · ba2bf218
      Kurt Hackel 提交于
      This was previously broken and migration of some locks had to be temporarily
      disabled. We use a new (and backward-incompatible) set of network messages
      to account for all references to a lock resources held across the cluster.
      once these are all freed, the master node may then free the lock resource
      memory once its local references are dropped.
      Signed-off-by: NKurt Hackel <kurt.hackel@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      ba2bf218
  8. 12 12月, 2006 1 次提交
    • A
      [patch 3/3] OCFS2 Configurable timeouts - Protocol changes · 828ae6af
      Andrew Beekhof 提交于
      Modify the OCFS2 handshake to ensure essential timeouts are configured
      identically on all nodes.
      
      Only allow changes when there are no connected peers
      
      Improves the logic in o2net_advance_rx() which broke now that
      sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)
      
      Included is the field for userspace-heartbeat timeout to avoid the need for
      further protocol changes.
      
      Uses a global spinlock to ensure the decisions to update configfs entries
      are made on the correct value.  The region covered by the spinlock when
      incrementing the counter is much larger as this is the more critical case.
      
      Small cleanup contributed by Adrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Beekhof <abeekhof@suse.de>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      828ae6af
  9. 08 12月, 2006 1 次提交
  10. 22 11月, 2006 1 次提交
  11. 25 9月, 2006 2 次提交
    • M
      ocfs2: Remove i_generation from inode lock names · 24c19ef4
      Mark Fasheh 提交于
      OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
      Typically, i_generation is encoded in the lock name so that a deleted inode
      on and a new one in the same block don't share the same lvb.
      
      Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
      is potentially thrown away as soon as the meta data lock is taken - we
      cannot encode the lock name without first knowing i_generation, which
      requires a disk read.
      
      This patch encodes i_generation in the inode meta data lvb, and removes the
      value from the inode meta data lock name. This way, the read can be covered
      by a lock, and at the same time we can distinguish between an up to date and
      a stale LVB.
      
      This will help cold-cache stat(2) performance in particular.
      
      Since this patch changes the protocol version, we take the opportunity to do
      a minor re-organization of two of the LVB fields.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      24c19ef4
    • M
      ocfs2: Hook rest of the file system into dentry locking API · 379dfe9d
      Mark Fasheh 提交于
      Actually replace the vote calls with the new dentry operations. Make any
      necessary adjustments to get the scheme to work.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      379dfe9d
  12. 04 1月, 2006 1 次提交