1. 06 2月, 2007 4 次提交
    • D
      [DLM] add version check · 9e971b71
      David Teigland 提交于
      Check if we receive a message from another lockspace member running a
      version of the dlm with an incompatible inter-node message protocol.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9e971b71
    • D
      [DLM] fix old rcom messages · 38aa8b0c
      David Teigland 提交于
      A reply to a recovery message will often be received after the relevant
      recovery sequence has aborted and the next recovery sequence has begun.
      We need to ignore replies to these old messages from the previous
      recovery.  There's already a way to do this for synchronous recovery
      requests using the rc_id number, but not for async.
      
      Each recovery sequence already has a locally unique sequence number
      associated with it.  This patch adds a field to the rcom (recovery
      message) structure where this recovery sequence number can be placed,
      rc_seq.  When a node sends a reply to a recovery request, it copies the
      rc_seq number it received into rc_seq_reply.  When the first node receives
      the reply to its recovery message, it will check whether rc_seq_reply
      matches the current recovery sequence number, ls_recover_seq, and if not
      then it ignores the old reply.
      
      An old, inadequate approach to filtering out old replies (checking if the
      current stage of recovery has moved back to the start) has been removed
      from two spots.
      
      The protocol version number is changed to reflect the different rcom
      structures.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      38aa8b0c
    • D
      [DLM] fix resend rcom lock · dc200a88
      David Teigland 提交于
      There's a chance the new master of resource hasn't learned it's the new
      master before another node sends it a lock during recovery.  The node
      sending the lock needs to resend if this happens.
      
      - A sends a master lookup for resource R to C
      - B sends a master lookup for resource R to C
      - C receives A's lookup, assigns A to be master of R and
        sends a reply back to A
      - C receives B's lookup and sends a reply back to B saying
        that A is the master
      - B receives lookup reply from C and sends its lock for R to A
      - A receives lock from B, doesn't think it's the master of R
        and sends an error back to B
      - A receives lookup reply from C and becomes master of R
      - B gets error back from A and resends its lock back to A
        (this resending is what this patch does)
      - A receives lock from B, it now sees it's the master of R
        and takes the lock
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dc200a88
    • D
      [GFS2] don't try to lockfs after shutdown · c3780511
      David Teigland 提交于
      If an fs has already been shut down, a lockfs callback should do nothing.
      An fs that's been shut down can't acquire locks or do anything with
      respect to the cluster.
      
      Also, remove FIXME comment in withdraw function.  The missing bits of the
      withdraw procedure are now all done by user space.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c3780511
  2. 04 2月, 2007 2 次提交
    • A
      [PATCH] revert blockdev direct io back to 2.6.19 version · b2e895db
      Andrew Morton 提交于
      Andrew Vasquez is reporting as-iosched oopses and a 65% throughput
      slowdown due to the recent special-casing of direct-io against
      blockdevs.  We don't know why either of these things are occurring.
      
      The patch minimally reverts us back to the 2.6.19 code for a 2.6.20
      release.
      
      Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
      Cc: Ken Chen <kenchen@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2e895db
    • K
      [PATCH] aio: fix buggy put_ioctx call in aio_complete - v2 · dee11c23
      Ken Chen 提交于
      An AIO bug was reported that sleeping function is being called in softirq
      context:
      
      BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
      Call Trace:
           [<a000000100577b00>] __mutex_lock_slowpath+0x640/0x6c0
           [<a000000100577ba0>] mutex_lock+0x20/0x40
           [<a0000001000a25b0>] flush_workqueue+0xb0/0x1a0
           [<a00000010018c0c0>] __put_ioctx+0xc0/0x240
           [<a00000010018d470>] aio_complete+0x2f0/0x420
           [<a00000010019cc80>] finished_one_bio+0x200/0x2a0
           [<a00000010019d1c0>] dio_bio_complete+0x1c0/0x200
           [<a00000010019d260>] dio_bio_end_aio+0x60/0x80
           [<a00000010014acd0>] bio_endio+0x110/0x1c0
           [<a0000001002770e0>] __end_that_request_first+0x180/0xba0
           [<a000000100277b90>] end_that_request_chunk+0x30/0x60
           [<a0000002073c0c70>] scsi_end_request+0x50/0x300 [scsi_mod]
           [<a0000002073c1240>] scsi_io_completion+0x200/0x8a0 [scsi_mod]
           [<a0000002074729b0>] sd_rw_intr+0x330/0x860 [sd_mod]
           [<a0000002073b3ac0>] scsi_finish_command+0x100/0x1c0 [scsi_mod]
           [<a0000002073c2910>] scsi_softirq_done+0x230/0x300 [scsi_mod]
           [<a000000100277d20>] blk_done_softirq+0x160/0x1c0
           [<a000000100083e00>] __do_softirq+0x200/0x240
           [<a000000100083eb0>] do_softirq+0x70/0xc0
      
      See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2
      
      flush_workqueue() is not allowed to be called in the softirq context.
      However, aio_complete() called from I/O interrupt can potentially call
      put_ioctx with last ref count on ioctx and triggers bug.  It is simply
      incorrect to perform ioctx freeing from aio_complete.
      
      The bug is trigger-able from a race between io_destroy() and aio_complete().
      A possible scenario:
      
      cpu0                               cpu1
      io_destroy                         aio_complete
        wait_for_all_aios {                __aio_put_req
           ...                                 ctx->reqs_active--;
           if (!ctx->reqs_active)
              return;
        }
        ...
        put_ioctx(ioctx)
      
                                           put_ioctx(ctx);
                                              __put_ioctx
                                                bam! Bug trigger!
      
      The real problem is that the condition check of ctx->reqs_active in
      wait_for_all_aios() is incorrect that access to reqs_active is not
      being properly protected by spin lock.
      
      This patch adds that protective spin lock, and at the same time removes
      all duplicate ref counting for each kiocb as reqs_active is already used
      as a ref count for each active ioctx.  This also ensures that buggy call
      to flush_workqueue() in softirq context is eliminated.
      Signed-off-by: N"Ken Chen" <kenchen@google.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Suparna Bhattacharya <suparna@in.ibm.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: <stable@kernel.org>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dee11c23
  3. 02 2月, 2007 2 次提交
  4. 31 1月, 2007 9 次提交
  5. 30 1月, 2007 1 次提交
  6. 27 1月, 2007 15 次提交
  7. 25 1月, 2007 1 次提交
  8. 23 1月, 2007 3 次提交
  9. 22 1月, 2007 3 次提交