1. 16 7月, 2007 3 次提交
  2. 10 7月, 2007 5 次提交
  3. 16 6月, 2007 1 次提交
    • T
      block: always requeue !fs requests at the front · bc90ba09
      Tejun Heo 提交于
      SCSI marks internal commands with REQ_PREEMPT and push it at the front
      of the request queue using blk_execute_rq().  When entering suspended
      or frozen state, SCSI devices are quiesced using
      scsi_device_quiesce().  In quiesced state, only REQ_PREEMPT requests
      are processed.  This is how SCSI blocks other requests out while
      suspending and resuming.  As all internal commands are pushed at the
      front of the queue, this usually works.
      
      Unfortunately, this interacts badly with ordered requeueing.  To
      preserve request order on requeueing (due to busy device, active EH or
      other failures), requests are sorted according to ordered sequence on
      requeue if IO barrier is in progress.
      
      The following sequence deadlocks.
      
      1. IO barrier sequence issues.
      
      2. Suspend requested.  Queue is quiesced with part or all of IO
         barrier sequence at the front.
      
      3. During suspending or resuming, SCSI issues internal command which
         gets deferred and requeued for some reason.  As the command is
         issued after the IO barrier in #1, ordered requeueing code puts the
         request after IO barrier sequence.
      
      4. The device is ready to process requests again but still is in
         quiesced state and the first request of the queue isn't
         REQ_PREEMPT, so command processing is deadlocked -
         suspending/resuming waits for the issued request to complete while
         the request can't be processed till device is put back into
         running state by resuming.
      
      This can be fixed by always putting !fs requests at the front when
      requeueing.
      
      The following thread reports this deadlock.
      
        http://thread.gmane.org/gmane.linux.kernel/537473Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NDavid Greaves <david@dgreaves.com>
      Acked-by: NJeff Garzik <jeff@garzik.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc90ba09
  4. 24 5月, 2007 2 次提交
  5. 16 5月, 2007 1 次提交
  6. 11 5月, 2007 1 次提交
    • N
      When stacked block devices are in-use (e.g. md or dm), the recursive calls · d89d8796
      Neil Brown 提交于
      to generic_make_request can use up a lot of space, and we would rather they
      didn't.
      
      As generic_make_request is a void function, and as it is generally not
      expected that it will have any effect immediately, it is safe to delay any
      call to generic_make_request until there is sufficient stack space
      available.
      
      As ->bi_next is reserved for the driver to use, it can have no valid value
      when generic_make_request is called, and as __make_request implicitly
      assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
      certain that all callers set it to NULL.  We can therefore safely use
      bi_next to link pending requests together, providing we clear it before
      making the real call.
      
      So, we choose to allow each thread to only be active in one
      generic_make_request at a time.  If a subsequent (recursive) call is made,
      the bio is linked into a per-thread list, and is handled when the active
      call completes.
      
      As the list of pending bios is per-thread, there are no locking issues to
      worry about.
      
      I say above that it is "safe to delay any call...".  There are, however,
      some behaviours of a make_request_fn which would make it unsafe.  These
      include any behaviour that assumes anything will have changed after a
      recursive call to generic_make_request.
      
      These could include:
       - waiting for that call to finish and call it's bi_end_io function.
         md use to sometimes do this (marking the superblock dirty before
         completing a write) but doesn't any more
       - inspecting the bio for fields that generic_make_request might
         change, such as bi_sector or bi_bdev.  It is hard to see a good
         reason for this, and I don't think anyone actually does it.
       - inspecing the queue to see if, e.g. it is 'full' yet.  Again, I
         think this is very unlikely to be useful, or to be done.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <dm-devel@redhat.com>
      
      Alasdair G Kergon <agk@redhat.com> said:
      
       I can see nothing wrong with this in principle.
      
       For device-mapper at the moment though it's essential that, while the bio
       mappings may now get delayed, they still get processed in exactly
       the same order as they were passed to generic_make_request().
      
       My main concern is whether the timing changes implicit in this patch
       will make the rare data-corrupting races in the existing snapshot code
       more likely. (I'm working on a fix for these races, but the unfinished
       patch is already several hundred lines long.)
      
       It would be helpful if some people on this mailing list would test
       this patch in various scenarios and report back.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d89d8796
  7. 10 5月, 2007 4 次提交
  8. 09 5月, 2007 3 次提交
  9. 08 5月, 2007 2 次提交
  10. 03 5月, 2007 1 次提交
  11. 30 4月, 2007 17 次提交