1. 27 5月, 2014 3 次提交
  2. 24 5月, 2014 1 次提交
  3. 23 5月, 2014 1 次提交
    • J
      blk-mq: split make request handler for multi and single queue · 07068d5b
      Jens Axboe 提交于
      We want slightly different behavior from them:
      
      - On single queue devices, we currently use the per-process plug
        for deferred IO and for merging.
      
      - On multi queue devices, we don't use the per-process plug, but
        we want to go straight to hardware for SYNC IO.
      
      Split blk_mq_make_request() into a blk_sq_make_request() for single
      queue devices, and retain blk_mq_make_request() for multi queue
      devices. Then we don't need multiple checks for q->nr_hw_queues
      in the request mapping.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      07068d5b
  4. 22 5月, 2014 2 次提交
  5. 21 5月, 2014 3 次提交
  6. 20 5月, 2014 1 次提交
    • J
      blk-mq: switch ctx pending map to the sparser blk_align_bitmap · 1429d7c9
      Jens Axboe 提交于
      Each hardware queue has a bitmap of software queues with pending
      requests. When new IO is queued on a software queue, the bit is
      set, and when IO is pruned on a hardware queue run, the bit is
      cleared. This causes a lot of traffic. Switch this from the regular
      BITS_PER_LONG bitmap to a sparser layout, similarly to what was
      done for blk-mq tagging.
      
      20% performance increase was observed for single threaded IO, and
      about 15% performanc increase on multiple threads driving the
      same device.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1429d7c9
  7. 14 5月, 2014 1 次提交
    • J
      blk-mq: improve support for shared tags maps · 0d2602ca
      Jens Axboe 提交于
      This adds support for active queue tracking, meaning that the
      blk-mq tagging maintains a count of active users of a tag set.
      This allows us to maintain a notion of fairness between users,
      so that we can distribute the tag depth evenly without starving
      some users while allowing others to try unfair deep queues.
      
      If sharing of a tag set is detected, each hardware queue will
      track the depth of its own queue. And if this exceeds the total
      depth divided by the number of active queues, the user is actively
      throttled down.
      
      The active queue count is done lazily to avoid bouncing that data
      between submitter and completer. Each hardware queue gets marked
      active when it allocates its first tag, and gets marked inactive
      when 1) the last tag is cleared, and 2) the queue timeout grace
      period has passed.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0d2602ca
  8. 10 5月, 2014 1 次提交
  9. 09 5月, 2014 3 次提交
    • J
      blk-mq: implement new and more efficient tagging scheme · 4bb659b1
      Jens Axboe 提交于
      blk-mq currently uses percpu_ida for tag allocation. But that only
      works well if the ratio between tag space and number of CPUs is
      sufficiently high. For most devices and systems, that is not the
      case. The end result if that we either only utilize the tag space
      partially, or we end up attempting to fully exhaust it and run
      into lots of lock contention with stealing between CPUs. This is
      not optimal.
      
      This new tagging scheme is a hybrid bitmap allocator. It uses
      two tricks to both be SMP friendly and allow full exhaustion
      of the space:
      
      1) We cache the last allocated (or freed) tag on a per blk-mq
         software context basis. This allows us to limit the space
         we have to search. The key element here is not caching it
         in the shared tag structure, otherwise we end up dirtying
         more shared cache lines on each allocate/free operation.
      
      2) The tag space is split into cache line sized groups, and
         each context will start off randomly in that space. Even up
         to full utilization of the space, this divides the tag users
         efficiently into cache line groups, avoiding dirtying the same
         one both between allocators and between allocator and freeer.
      
      This scheme shows drastically better behaviour, both on small
      tag spaces but on large ones as well. It has been tested extensively
      to show better performance for all the cases blk-mq cares about.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4bb659b1
    • C
      blk-mq: initialize struct request fields individually · af76e555
      Christoph Hellwig 提交于
      This allows us to avoid a non-atomic memset over ->atomic_flags as well
      as killing lots of duplicate initializations.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      af76e555
    • J
      blk-mq: update a hotplug comment for grammar · 9fccfed8
      Jens Axboe 提交于
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9fccfed8
  10. 08 5月, 2014 1 次提交
  11. 03 5月, 2014 1 次提交
  12. 01 5月, 2014 2 次提交
  13. 30 4月, 2014 1 次提交
    • J
      blk-mq: fix waiting for reserved tags · 5810d903
      Jens Axboe 提交于
      blk_mq_wait_for_tags() is only able to wait for "normal" tags,
      not reserved tags. Pass in which one we should attempt to get
      a tag for, so that waiting for reserved tags will work.
      
      Reserved tags are used for internal commands, which are usually
      serialized. Hence no waiting generally takes place, but we should
      ensure that it actually works if users need that functionality.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5810d903
  14. 25 4月, 2014 1 次提交
    • C
      blk-mq: respect rq_affinity · 38535201
      Christoph Hellwig 提交于
      The blk-mq code is using it's own version of the I/O completion affinity
      tunables, which causes a few issues:
      
       - the rq_affinity sysfs file doesn't work for blk-mq devices, even if it
         still is present, thus breaking existing tuning setups.
       - the rq_affinity = 1 mode, which is the defauly for legacy request based
         drivers isn't implemented at all.
       - blk-mq drivers don't implement any completion affinity with the default
         flag settings.
      
      This patches removes the blk-mq ipi_redirect flag and sysfs file, as well
      as the internal BLK_MQ_F_SHOULD_IPI flag and replaces it with code that
      respects the queue-wide rq_affinity flags and also implements the
      rq_affinity = 1 mode.
      
      This means I/O completion affinity can now only be tuned block-queue wide
      instead of per context, which seems more sensible to me anyway.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      38535201
  15. 24 4月, 2014 3 次提交
  16. 22 4月, 2014 4 次提交
  17. 17 4月, 2014 8 次提交
  18. 16 4月, 2014 3 次提交
    • C
      blk-mq: split out tag initialization, support shared tags · 24d2f903
      Christoph Hellwig 提交于
      Add a new blk_mq_tag_set structure that gets set up before we initialize
      the queue.  A single blk_mq_tag_set structure can be shared by multiple
      queues.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      
      Modular export of blk_mq_{alloc,free}_tagset added by me.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      24d2f903
    • C
      blk-mq: initialize request on allocation · ed44832d
      Christoph Hellwig 提交于
      If we want to share tag and request allocation between queues we cannot
      initialize the request at init/free time, but need to initialize it
      at allocation time as it might get used for different queues over its
      lifetime.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ed44832d
    • C
      blk-mq: add ->init_request and ->exit_request methods · e9b267d9
      Christoph Hellwig 提交于
      The current blk_mq_init_commands/blk_mq_free_commands interface has a
      two problems:
      
       1) Because only the constructor is passed to blk_mq_init_commands there
          is no easy way to clean up when a comman initialization failed.  The
          current code simply leaks the allocations done in the constructor.
      
       2) There is no good place to call blk_mq_free_commands: before
          blk_cleanup_queue there is no guarantee that all outstanding
          commands have completed, so we can't free them yet.  After
          blk_cleanup_queue the queue has usually been freed.  This can be
          worked around by grabbing an unconditional reference before calling
          blk_cleanup_queue and dropping it after blk_mq_free_commands is
          done, although that's not exatly pretty and driver writers are
          guaranteed to get it wrong sooner or later.
      
      Both issues are easily fixed by making the request constructor and
      destructor normal blk_mq_ops methods.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e9b267d9