1. 06 2月, 2007 10 次提交
  2. 16 12月, 2006 1 次提交
  3. 08 12月, 2006 1 次提交
  4. 07 12月, 2006 1 次提交
    • P
      [DLM] Clean up lowcomms · ac33d071
      Patrick Caulfield 提交于
      This fixes up most of the things pointed out by akpm and Pavel Machek
      with comments below indicating why some things have been left:
      
      Andrew Morton wrote:
      >
      >> +static struct nodeinfo *nodeid2nodeinfo(int nodeid, gfp_t alloc)
      >> +{
      >> +	struct nodeinfo *ni;
      >> +	int r;
      >> +	int n;
      >> +
      >> +	down_read(&nodeinfo_lock);
      >
      > Given that this function can sleep, I wonder if `alloc' is useful.
      >
      > I see lots of callers passing in a literal "0" for `alloc'.  That's in fact
      > a secret (GFP_ATOMIC & ~__GFP_HIGH).  I doubt if that's what you really
      > meant.  Particularly as the code could at least have used __GFP_WAIT (aka
      > GFP_NOIO) which is much, much more reliable than "0".  In fact "0" is the
      > least reliable mode possible.
      >
      > IOW, this is all bollixed up.
      
      When 0 is passed into nodeid2nodeinfo the function does not try to allocate a
      new structure at all. it's an indication that the caller only wants the nodeinfo
      struct for that nodeid if there actually is one in existance.
      I've tidied the function itself so it's more obvious, (and tidier!)
      
      >> +/* Data received from remote end */
      >> +static int receive_from_sock(void)
      >> +{
      >> +	int ret = 0;
      >> +	struct msghdr msg;
      >> +	struct kvec iov[2];
      >> +	unsigned len;
      >> +	int r;
      >> +	struct sctp_sndrcvinfo *sinfo;
      >> +	struct cmsghdr *cmsg;
      >> +	struct nodeinfo *ni;
      >> +
      >> +	/* These two are marginally too big for stack allocation, but this
      >> +	 * function is (currently) only called by dlm_recvd so static should be
      >> +	 * OK.
      >> +	 */
      >> +	static struct sockaddr_storage msgname;
      >> +	static char incmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))];
      >
      > whoa.  This is globally singly-threaded code??
      
      Yes. it is only ever run in the context of dlm_recvd.
      >>
      >> +static void initiate_association(int nodeid)
      >> +{
      >> +	struct sockaddr_storage rem_addr;
      >> +	static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))];
      >
      > Another static buffer to worry about.  Globally singly-threaded code?
      
      Yes. Only ever called by dlm_sendd.
      
      >> +
      >> +/* Send a message */
      >> +static int send_to_sock(struct nodeinfo *ni)
      >> +{
      >> +	int ret = 0;
      >> +	struct writequeue_entry *e;
      >> +	int len, offset;
      >> +	struct msghdr outmsg;
      >> +	static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))];
      >
      > Singly-threaded?
      
      Yep.
      
      >>
      >> +static void dealloc_nodeinfo(void)
      >> +{
      >> +	int i;
      >> +
      >> +	for (i=1; i<=max_nodeid; i++) {
      >> +		struct nodeinfo *ni = nodeid2nodeinfo(i, 0);
      >> +		if (ni) {
      >> +			idr_remove(&nodeinfo_idr, i);
      >
      > Didn't that need locking?
      
      Not. it's only ever called at DLM shutdown after all the other threads
      have been stopped.
      
      >>
      >> +static int write_list_empty(void)
      >> +{
      >> +	int status;
      >> +
      >> +	spin_lock_bh(&write_nodes_lock);
      >> +	status = list_empty(&write_nodes);
      >> +	spin_unlock_bh(&write_nodes_lock);
      >> +
      >> +	return status;
      >> +}
      >
      > This function's return value is meaningless.  As soon as the lock gets
      > dropped, the return value can get out of sync with reality.
      >
      > Looking at the caller, this _might_ happen to be OK, but it's a nasty and
      > dangerous thing.  Really the locking should be moved into the caller.
      
      It's just an optimisation to allow the caller to schedule if there is no work
      to do. if something arrives immediately afterwards then it will get picked up
      when the process re-awakes (and it will be woken by that arrival).
      
      The 'accepting' atomic has gone completely. as Andrew pointed out it didn't
      really achieve much anyway. I suspect it was a plaster over some other
      startup or shutdown bug to be honest.
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      ac33d071
  5. 30 11月, 2006 13 次提交
    • R
      [DLM] fix format warnings in rcom.c and recoverd.c · 57adf7ee
      Ryusuke Konishi 提交于
      This fixes the following gcc warnings generated on
      the architectures where uint64_t != unsigned long long (e.g. ppc64).
      
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
      fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      Signed-off-by: NRyusuke Konishi <ryusuke@osrg.net>
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      57adf7ee
    • D
      [DLM] don't accept replies to old recovery messages · 98f176fb
      David Teigland 提交于
      We often abort a recovery after sending a status request to a remote node.
      We want to ignore any potential status reply we get from the remote node.
      If we get one of these unwanted replies, we've often moved on to the next
      recovery message and incremented the message sequence counter, so the
      reply will be ignored due to the seq number.  In some cases, we've not
      moved on to the next message so the seq number of the reply we want to
      ignore is still correct, causing the reply to be accepted.  The next
      recovery message will then mistake this old reply as a new one.
      
      To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
      new reply.  We clear this flag if we abort recovery while waiting for a
      reply.  Before the flag is set again (to allow new replies) we know that
      any old replies will be rejected due to their sequence number.  We also
      initialize the recovery-message sequence number to a random value when a
      lockspace is first created.  This makes it clear when messages are being
      rejected from an old instance of a lockspace that has since been
      recreated.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      98f176fb
    • D
      [DLM] fix size of STATUS_REPLY message · 1babdb45
      David Teigland 提交于
      When the not_ready routine sends a "fake" status reply with blank status
      flags, it needs to use the correct size for a normal STATUS_REPLY by
      including the size of the would-be config parameters.  We also fill in the
      non-existant config parameters with an invalid lvblen value so it's easier
      to notice if these invalid paratmers are ever being used.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1babdb45
    • D
      [DLM] fix add_requestqueue checking nodes list · 2896ee37
      David Teigland 提交于
      Requests that arrive after recovery has started are saved in the
      requestqueue and processed after recovery is done.  Some of these requests
      are purged during recovery if they are from nodes that have been removed.
      We move the purging of the requests (dlm_purge_requestqueue) to later in
      the recovery sequence which allows the routine saving requests
      (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the
      same will be done by the purge.  The current code has add_requestqueue
      filtering by nodeid but doesn't hold any locks when accessing the list of
      current nodes.  This also means that we need to call the purge routine
      when the lockspace is being shut down since the add routine will not be
      rejecting requests itself any more.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2896ee37
    • P
      [DLM] Fix DLM config · b98c95af
      Patrick Caulfield 提交于
      The attached patch fixes the DLM config so that it selects the chosen network
      transport. It should fix the bug where DLM can be left selected when NET gets
      unselected. This incorporates all the comments received about this patch.
      
      Cc: Adrian Bunk <bunk@stusta.de>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b98c95af
    • D
      [DLM] clear sbflags on lock master · 6f90a8b1
      David Teigland 提交于
      RH BZ 211622
      
      The ALTMODE flag can be set in the lock master's copy of the lock but
      never cleared, so ALTMODE will also be returned in a subsequent conversion
      of the lock when it shouldn't be.  This results in lock_dlm incorrectly
      switching to the alternate lock mode when returning the result to gfs
      which then asserts when it sees the wrong lock state.  The fix is to
      propagate the cleared sbflags value to the master node when the lock is
      requested.  QA's d_rwrandirectlarge test triggers this bug very quickly.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6f90a8b1
    • D
      [DLM] do full recover_locks barrier · 4b77f2c9
      David Teigland 提交于
      Red Hat BZ 211914
      
      The previous patch "[DLM] fix aborted recovery during
      node removal" was incomplete as discovered with further testing.  It set
      the bit for the RS_LOCKS barrier but did not then wait for the barrier.
      This is often ok, but sometimes it will cause yet another recovery hang.
      If it's a new node that also has the lowest nodeid that skips the barrier
      wait, then it misses the important step of collecting and reporting the
      barrier status from the other nodes (which is the job of the low nodeid in
      the barrier wait routine).
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4b77f2c9
    • D
      [DLM] fix stopping unstarted recovery · 2cdc98aa
      David Teigland 提交于
      Red Hat BZ 211914
      
      When many nodes are joining a lockspace simultaneously, the dlm gets a
      quick sequence of stop/start events, a pair for adding each node.
      dlm_controld in user space sends dlm_recoverd in the kernel each stop and
      start event.  dlm_controld will sometimes send the stop before
      dlm_recoverd has had a chance to take up the previously queued start.  The
      stop aborts the processing of the previous start by setting the
      RECOVERY_STOP flag.  dlm_recoverd is erroneously clearing this flag and
      ignoring the stop/abort if it happens to take up the start after the stop
      meant to abort it.  The fix is to check the sequence number that's
      incremented for each stop/start before clearing the flag.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2cdc98aa
    • D
      [DLM] fix aborted recovery during node removal · 91c0dc93
      David Teigland 提交于
      Red Hat BZ 211914
      
      With the new cluster infrastructure, dlm recovery for a node removal can
      be aborted and restarted for a node addition.  When this happens, the
      restarted recovery isn't aware that it's doing recovery for the earlier
      removal as well as the addition.  So, it then skips the recovery steps
      only required when nodes are removed.  This can result in locks not being
      purged for failed/removed nodes.  The fix is to check for removed nodes
      for which recovery has not been completed at the start of a new recovery
      sequence.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      91c0dc93
    • D
      [DLM] fix requestqueue race · d4400156
      David Teigland 提交于
      Red Hat BZ 211914
      
      There's a race between dlm_recoverd (1) enabling locking and (2) clearing
      out the requestqueue, and dlm_recvd (1) checking if locking is enabled and
      (2) adding a message to the requestqueue.  An order of recoverd(1),
      recvd(1), recvd(2), recoverd(2) will result in a message being left on the
      requestqueue.  The fix is to have dlm_recvd check if dlm_recoverd has
      enabled locking after taking the mutex for the requestqueue and if it has
      processing the message instead of queueing it.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d4400156
    • D
      [DLM] status messages ping-pong between unmounted nodes · 435618b7
      David Teigland 提交于
      Red Hat BZ 213682
      
      If two nodes leave the lockspace (while unmounting the fs in the case of
      gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY
      messages will then ping-pong between the nodes when neither of them can
      find the lockspace in question any longer.  We kill this by not sending
      another STATUS message when we get a STATUS_REPLY for an unknown
      lockspace.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      435618b7
    • D
      [DLM] res_recover_locks_count not reset when recover_locks is aborted · 52069809
      David Teigland 提交于
      Red Hat BZ 213684
      
      If a node sends an lkb to the new master (RCOM_LOCK message) during
      recovery and recovery is then aborted on both nodes before it gets a
      reply, the res_recover_locks_count needs to be reset to 0 so that when the
      subsequent recovery comes along and sends the lkb to the new master again
      the assertion doesn't trigger that checks that counter is zero.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      52069809
    • P
      [DLM] Add support for tcp communications · fdda387f
      Patrick Caulfield 提交于
      The following patch adds a TCP based communications layer
      to the DLM which is compile time selectable. The existing SCTP
      layer gives the advantage of allowing multihoming, whereas
      the TCP layer has been heavily tested in previous versions of
      the DLM and is known to be robust and therefore can be used as
      a baseline for performance testing.
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fdda387f
  6. 06 11月, 2006 2 次提交
  7. 20 10月, 2006 1 次提交
  8. 13 10月, 2006 2 次提交
  9. 10 10月, 2006 1 次提交
  10. 28 9月, 2006 1 次提交
    • T
      [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs) · bba9dfd8
      Theodore Ts'o 提交于
      The following patches reduce the size of the VFS inode structure by 28 bytes
      on a UP x86.  (It would be more on an x86_64 system).  This is a 10% reduction
      in the inode size on a UP kernel that is configured in a production mode
      (i.e., with no spinlock or other debugging functions enabled; if you want to
      save memory taken up by in-core inodes, the first thing you should do is
      disable the debugging options; they are responsible for a huge amount of bloat
      in the VFS inode structure).
      
      This patch:
      
      The filesystem or device-specific pointer in the inode is inside a union,
      which is pretty pointless given that all 30+ users of this field have been
      using the void pointer.  Get rid of the union and rename it to i_private, with
      a comment to explain who is allowed to use the void pointer.  This is just a
      cleanup, but it allows us to reuse the union 'u' for something something where
      the union will actually be used.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      bba9dfd8
  11. 25 9月, 2006 1 次提交
  12. 09 9月, 2006 1 次提交
    • D
      [DLM] confirm master for recovered waiting requests · fa9f0e49
      David Teigland 提交于
      Fixing the following scenario:
      - A request is on the waiters list waiting for a reply from a remote node.
      - The request is the first one on the resource, so first_lkid is set.
      - The remote node fails causing recovery.
      - During recovery the requesting node becomes master.
      - The request is now processed locally instead of being a remote operation.
      - At this point we need to call confirm_master() on the resource since
        we're certain we're now the master node.  This will clear first_lkid.
      - We weren't calling confirm_master(), so first_lkid was not being cleared
        causing subsequent requests on that resource to get stuck.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fa9f0e49
  13. 07 9月, 2006 1 次提交
  14. 01 9月, 2006 1 次提交
  15. 25 8月, 2006 1 次提交
    • D
      [DLM] add new lockspace to list ealier · 5f88f1ea
      David Teigland 提交于
      When a new lockspace was being created, the recoverd thread was being
      started for it before the lockspace was added to the global list of
      lockspaces.  The new thread was looking up the lockspace in the global
      list and sometimes not finding it due to the race with the original thread
      adding it to the list.  We need to add the lockspace to the global list
      before starting the thread instead of after, and if the new thread can't
      find the lockspace for some reason, it should return an error.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5f88f1ea
  16. 24 8月, 2006 2 次提交