1. 08 11月, 2018 9 次提交
  2. 26 10月, 2018 1 次提交
  3. 16 10月, 2018 1 次提交
  4. 14 10月, 2018 4 次提交
  5. 09 10月, 2018 1 次提交
    • M
      blk-mq: complete req in softirq context in case of single queue · 36e76539
      Ming Lei 提交于
      Lot of controllers may have only one irq vector for completing IO
      request. And usually affinity of the only irq vector is all possible
      CPUs, however, on most of ARCH, there may be only one specific CPU
      for handling this interrupt.
      
      So if all IOs are completed in hardirq context, it is inevitable to
      degrade IO performance because of increased irq latency.
      
      This patch tries to address this issue by allowing to complete request
      in softirq context, like the legacy IO path.
      
      IOPS is observed as ~13%+ in the following randread test on raid0 over
      virtio-scsi.
      
      mdadm --create --verbose /dev/md0 --level=0 --chunk=1024 --raid-devices=8 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi
      
      fio --time_based --name=benchmark --runtime=30 --filename=/dev/md0 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=32 --rw=randread --blocksize=4k
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Zach Marano <zmarano@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      36e76539
  6. 28 9月, 2018 2 次提交
  7. 27 9月, 2018 2 次提交
  8. 21 8月, 2018 2 次提交
    • J
      blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter · f5bbbbe4
      Jianchao Wang 提交于
      For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
      account the inflight requests. It will access the queue_hw_ctx and
      nr_hw_queues w/o any protection. When updating nr_hw_queues and
      blk_mq_in_flight/rw occur concurrently, panic comes up.
      
      Before update nr_hw_queues, the q will be frozen. So we could use
      q_usage_counter to avoid the race. percpu_ref_is_zero is used here
      so that we will not miss any in-flight request. The access to
      nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
      under rcu critical section, __blk_mq_update_nr_hw_queues could use
      synchronize_rcu to ensure the zeroed q_usage_counter to be globally
      visible.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5bbbbe4
    • J
      blk-mq: init hctx sched after update ctx and hctx mapping · d48ece20
      Jianchao Wang 提交于
      Currently, when update nr_hw_queues, IO scheduler's init_hctx will
      be invoked before the mapping between ctx and hctx is adapted
      correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
      may depend on this mapping and get wrong result and panic finally.
      A simply way to fix this is that switch the IO scheduler to 'none'
      before update the nr_hw_queues, and then switch it back after
      update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
      to nobody use them any more.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d48ece20
  9. 09 8月, 2018 1 次提交
    • J
      blk-mq: count the hctx as active before allocating tag · d263ed99
      Jianchao Wang 提交于
      Currently, we count the hctx as active after allocate driver tag
      successfully. If a previously inactive hctx try to get tag first
      time, it may fails and need to wait. However, due to the stale tag
      ->active_queues, the other shared-tags users are still able to
      occupy all driver tags while there is someone waiting for tag.
      Consequently, even if the previously inactive hctx is waked up, it
      still may not be able to get a tag and could be starved.
      
      To fix it, we count the hctx as active before try to allocate driver
      tag, then when it is waiting the tag, the other shared-tag users
      will reserve budget for it.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d263ed99
  10. 25 7月, 2018 1 次提交
  11. 23 7月, 2018 1 次提交
    • M
      blk-mq: fail the request in case issue failure · 8824f622
      Ming Lei 提交于
      Inside blk_mq_try_issue_list_directly(), if the request is issued as
      failed, we shouldn't try to do it again, otherwise the warning in
      blk_mq_start_request() will be triggered. This change is aligned to
      behaviour of other ways of request issue & dispatch.
      
      Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: LKP <lkp@01.org>
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8824f622
  12. 18 7月, 2018 1 次提交
    • M
      blk-mq: issue directly if hw queue isn't busy in case of 'none' · 6ce3dd6e
      Ming Lei 提交于
      In case of 'none' io scheduler, when hw queue isn't busy, it isn't
      necessary to enqueue request to sw queue and dequeue it from
      sw queue because request may be submitted to hw queue asap without
      extra cost, meantime there shouldn't be much request in sw queue,
      and we don't need to worry about effect on IO merge.
      
      There are still some single hw queue SCSI HBAs(HPSA, megaraid_sas, ...)
      which may connect high performance devices, so 'none' is often required
      for obtaining good performance.
      
      This patch improves IOPS and decreases CPU unilization on megaraid_sas,
      per Kashyap's test.
      
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Reported-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6ce3dd6e
  13. 09 7月, 2018 10 次提交
  14. 29 6月, 2018 1 次提交
    • J
      blk-mq: don't queue more if we get a busy return · 1f57f8d4
      Jens Axboe 提交于
      Some devices have different queue limits depending on the type of IO. A
      classic case is SATA NCQ, where some commands can queue, but others
      cannot. If we have NCQ commands inflight and encounter a non-queueable
      command, the driver returns busy. Currently we attempt to dispatch more
      from the scheduler, if we were able to queue some commands. But for the
      case where we ended up stopping due to BUSY, we should not attempt to
      retrieve more from the scheduler. If we do, we can get into a situation
      where we attempt to queue a non-queueable command, get BUSY, then
      successfully retrieve more commands from that scheduler and queue those.
      This can repeat forever, starving the non-queuable command indefinitely.
      
      Fix this by NOT attempting to pull more commands from the scheduler, if
      we get a BUSY return. This should also be more optimal in terms of
      letting requests stay in the scheduler for as long as possible, if we
      get a BUSY due to the regular out-of-tags condition.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1f57f8d4
  15. 24 6月, 2018 1 次提交
  16. 14 6月, 2018 1 次提交
  17. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc_node() -> kcalloc_node() · 590b5b7d
      Kees Cook 提交于
      The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This
      patch replaces cases of:
      
              kzalloc_node(a * b, gfp, node)
      
      with:
              kcalloc_node(a * b, gfp, node)
      
      as well as handling cases of:
      
              kzalloc_node(a * b * c, gfp, node)
      
      with:
      
              kzalloc_node(array3_size(a, b, c), gfp, node)
      
      as it's slightly less ugly than:
      
              kcalloc_node(array_size(a, b), c, gfp, node)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc_node(4 * 1024, gfp, node)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc_node(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc_node(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc_node(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc_node
      + kcalloc_node
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc_node(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc_node(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc_node(C1 * C2 * C3, ...)
      |
        kzalloc_node(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc_node(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc_node(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc_node(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc_node(sizeof(THING) * C2, ...)
      |
        kzalloc_node(sizeof(TYPE) * C2, ...)
      |
        kzalloc_node(C1 * C2 * C3, ...)
      |
        kzalloc_node(C1 * C2, ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      590b5b7d