1. 28 5月, 2014 3 次提交
    • J
      blk-mq: remove stale comment for blk_mq_complete_request() · 7738dac4
      Jens Axboe 提交于
      It works for both IPI and local completions as of commit
      95f09684.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7738dac4
    • J
      blk-mq: allow non-softirq completions · 95f09684
      Jens Axboe 提交于
      Right now we export two ways of completing a request:
      
      1) blk_mq_complete_request(). This uses an IPI (if needed) and
         completes through q->softirq_done_fn(). It also works with
         timeouts.
      
      2) blk_mq_end_io(). This completes inline, and ignores any timeout
         state of the request.
      
      Let blk_mq_complete_request() handle non-softirq_done_fn completions
      as well, by just completing inline. If a driver has enough completion
      ports to place completions correctly, it need not define a
      mq_ops->complete() and we can avoid an indirect function call by
      doing the completion inline.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      95f09684
    • J
      blk-mq: pass in suggested NUMA node to ->alloc_hctx() · f14bbe77
      Jens Axboe 提交于
      Drivers currently have to figure this out on their own, and they
      are missing information to do it properly. The ones that did
      attempt to do it, do it wrong.
      
      So just pass in the suggested node directly to the alloc
      function.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f14bbe77
  2. 27 5月, 2014 5 次提交
  3. 24 5月, 2014 2 次提交
  4. 23 5月, 2014 1 次提交
    • J
      blk-mq: split make request handler for multi and single queue · 07068d5b
      Jens Axboe 提交于
      We want slightly different behavior from them:
      
      - On single queue devices, we currently use the per-process plug
        for deferred IO and for merging.
      
      - On multi queue devices, we don't use the per-process plug, but
        we want to go straight to hardware for SYNC IO.
      
      Split blk_mq_make_request() into a blk_sq_make_request() for single
      queue devices, and retain blk_mq_make_request() for multi queue
      devices. Then we don't need multiple checks for q->nr_hw_queues
      in the request mapping.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      07068d5b
  5. 22 5月, 2014 2 次提交
  6. 21 5月, 2014 3 次提交
  7. 20 5月, 2014 6 次提交
  8. 19 5月, 2014 1 次提交
  9. 14 5月, 2014 1 次提交
    • J
      blk-mq: improve support for shared tags maps · 0d2602ca
      Jens Axboe 提交于
      This adds support for active queue tracking, meaning that the
      blk-mq tagging maintains a count of active users of a tag set.
      This allows us to maintain a notion of fairness between users,
      so that we can distribute the tag depth evenly without starving
      some users while allowing others to try unfair deep queues.
      
      If sharing of a tag set is detected, each hardware queue will
      track the depth of its own queue. And if this exceeds the total
      depth divided by the number of active queues, the user is actively
      throttled down.
      
      The active queue count is done lazily to avoid bouncing that data
      between submitter and completer. Each hardware queue gets marked
      active when it allocates its first tag, and gets marked inactive
      when 1) the last tag is cleared, and 2) the queue timeout grace
      period has passed.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0d2602ca
  10. 11 5月, 2014 5 次提交
  11. 10 5月, 2014 3 次提交
    • J
      block: only calculate part_in_flight() once · 7276d02e
      Jens Axboe 提交于
      We first check if we have inflight IO, then retrieve that
      same number again. Usually this isn't that costly since the
      chance of having the data dirtied in between is small, but
      there's no reason for calling part_in_flight() twice.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7276d02e
    • J
      blk-mq: fix race in IO start accounting · cf4b50af
      Jens Axboe 提交于
      Commit c6d600c6 opened up a small race where we could attempt to
      account IO completion on a request, racing with IO start accounting.
      Fix this up by ensuring that we've accounted for IO start before
      inserting the request.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cf4b50af
    • J
      blk-mq: use sparser tag layout for lower queue depth · 59d13bf5
      Jens Axboe 提交于
      For best performance, spreading tags over multiple cachelines
      makes the tagging more efficient on multicore systems. But since
      we have 8 * sizeof(unsigned long) tags per cacheline, we don't
      always get a nice spread.
      
      Attempt to spread the tags over at least 4 cachelines, using fewer
      number of bits per unsigned long if we have to. This improves
      tagging performance in setups with 32-128 tags. For higher depths,
      the spread is the same as before (BITS_PER_LONG tags per cacheline).
      Signed-off-by: NJens Axboe <axboe@fb.com>
      59d13bf5
  12. 09 5月, 2014 3 次提交
    • J
      blk-mq: implement new and more efficient tagging scheme · 4bb659b1
      Jens Axboe 提交于
      blk-mq currently uses percpu_ida for tag allocation. But that only
      works well if the ratio between tag space and number of CPUs is
      sufficiently high. For most devices and systems, that is not the
      case. The end result if that we either only utilize the tag space
      partially, or we end up attempting to fully exhaust it and run
      into lots of lock contention with stealing between CPUs. This is
      not optimal.
      
      This new tagging scheme is a hybrid bitmap allocator. It uses
      two tricks to both be SMP friendly and allow full exhaustion
      of the space:
      
      1) We cache the last allocated (or freed) tag on a per blk-mq
         software context basis. This allows us to limit the space
         we have to search. The key element here is not caching it
         in the shared tag structure, otherwise we end up dirtying
         more shared cache lines on each allocate/free operation.
      
      2) The tag space is split into cache line sized groups, and
         each context will start off randomly in that space. Even up
         to full utilization of the space, this divides the tag users
         efficiently into cache line groups, avoiding dirtying the same
         one both between allocators and between allocator and freeer.
      
      This scheme shows drastically better behaviour, both on small
      tag spaces but on large ones as well. It has been tested extensively
      to show better performance for all the cases blk-mq cares about.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4bb659b1
    • C
      blk-mq: initialize struct request fields individually · af76e555
      Christoph Hellwig 提交于
      This allows us to avoid a non-atomic memset over ->atomic_flags as well
      as killing lots of duplicate initializations.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      af76e555
    • J
      blk-mq: update a hotplug comment for grammar · 9fccfed8
      Jens Axboe 提交于
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9fccfed8
  13. 08 5月, 2014 1 次提交
  14. 03 5月, 2014 3 次提交
  15. 01 5月, 2014 1 次提交