1. 10 8月, 2017 5 次提交
  2. 02 8月, 2017 1 次提交
  3. 01 8月, 2017 1 次提交
    • J
      blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled · b7a71e66
      Jens Axboe 提交于
      We recently had a bug in the IPR SCSI driver, where it would end up
      making the SCSI mid layer run the mq hardware queue with interrupts
      disabled. This isn't legal, since the software queue locking relies
      on never being grabbed from interrupt context. Additionally, drivers
      that set BLK_MQ_F_BLOCKING may schedule from this context.
      
      Add a WARN_ON_ONCE() to catch bad users up front.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7a71e66
  4. 29 7月, 2017 3 次提交
  5. 25 7月, 2017 1 次提交
  6. 24 7月, 2017 1 次提交
  7. 12 7月, 2017 2 次提交
    • H
      bfq: dispatch request to prevent queue stalling after the request completion · 3f7cb4f4
      Hou Tao 提交于
      There are mq devices (eg., virtio-blk, nbd and loopback) which don't
      invoke blk_mq_run_hw_queues() after the completion of a request.
      If bfq is enabled on these devices and the slice_idle attribute or
      strict_guarantees attribute is set as zero, it is possible that
      after a request completion the remaining requests of busy bfq queue
      will stalled in the bfq schedule until a new request arrives.
      
      To fix the scheduler latency problem, we need to check whether or not
      all issued requests have completed and dispatch more requests to driver
      if there is no request in driver.
      
      The problem can be reproduced by running the following script
      on a virtio-blk device with nr_hw_queues as 1:
      
      #!/bin/sh
      
      dev=vdb
      # mount point for dev
      mp=/tmp/mnt
      cd $mp
      
      job=strict.job
      cat <<EOF > $job
      [global]
      direct=1
      bs=4k
      size=256M
      rw=write
      ioengine=libaio
      iodepth=128
      runtime=5
      time_based
      
      [1]
      filename=1.data
      
      [2]
      new_group
      filename=2.data
      EOF
      
      echo bfq > /sys/block/$dev/queue/scheduler
      echo 1 > /sys/block/$dev/queue/iosched/strict_guarantees
      fio $job
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f7cb4f4
    • H
      bfq: fix typos in comments about B-WF2Q+ algorithm · 38c91407
      Hou Tao 提交于
      The start time of eligible entity should be less than or equal to
      the current virtual time, and the entity in idle tree has a finish
      time being greater than the current virtual time.
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      38c91407
  8. 11 7月, 2017 1 次提交
    • S
      block: call bio_uninit in bio_endio · b222dd2f
      Shaohua Li 提交于
      bio_free isn't a good place to free cgroup info. There are a
      lot of cases bio is allocated in special way (for example, in stack) and
      never gets called by bio_put hence bio_free, we are leaking memory. This
      patch moves the free to bio endio, which should be called anyway. The
      bio_uninit call in bio_free is kept, in case the bio never gets called
      bio endio.
      
      This assumes ->bi_end_io() doesn't access cgroup info, which seems true
      in my audit.
      
      This along with Christoph's integrity patch should fix the memory leak
      issue.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b222dd2f
  9. 06 7月, 2017 1 次提交
    • D
      block: Fix __blkdev_issue_zeroout loop · 615d22a5
      Damien Le Moal 提交于
      The BIO issuing loop in __blkdev_issue_zeroout() is allocating BIOs
      with a maximum number of bvec (pages) equal to
      
      min(nr_sects, (sector_t)BIO_MAX_PAGES)
      
      This works since the requested number of bvecs will always be limited
      to the absolute maximum number supported (BIO_MAX_PAGES), but this is
      ineficient as too many bvec entries may be requested due to the
      different units being used in the min() operation (number of sectors vs
      number of pages).
      To fix this, introduce the helper __blkdev_sectors_to_bio_pages() to
      correctly calculate the number of bvecs for zeroout BIOs as the issuing
      loop progresses. The calculation is done using consistent units and
      makes sure that the number of pages return is at least 1 (for cases
      where the number of sectors is less that the number of sectors in
      a page).
      
      Also remove a trailing space after the bit shift in the internal loop
      min() call.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      615d22a5
  10. 05 7月, 2017 1 次提交
  11. 04 7月, 2017 9 次提交
  12. 30 6月, 2017 2 次提交
  13. 29 6月, 2017 4 次提交
    • M
      blk-mq: map all HWQ also in hyperthreaded system · fe631457
      Max Gurtovoy 提交于
      This patch performs sequential mapping between CPUs and queues.
      In case the system has more CPUs than HWQs then there are still
      CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs
      and their siblings to the same HWQ.
      This actually fixes a bug that found unmapped HWQs in a system with
      2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs)
      running NVMEoF (opens upto maximum of 64 HWQs).
      
      Performance results running fio (72 jobs, 128 iodepth)
      using null_blk (w/w.o patch):
      
      bs      IOPS(read submit_queues=72)   IOPS(write submit_queues=72)   IOPS(read submit_queues=24)  IOPS(write submit_queues=24)
      -----  ----------------------------  ------------------------------ ---------------------------- -----------------------------
      512    4890.4K/4723.5K                 4524.7K/4324.2K                   4280.2K/4264.3K               3902.4K/3909.5K
      1k     4910.1K/4715.2K                 4535.8K/4309.6K                   4296.7K/4269.1K               3906.8K/3914.9K
      2k     4906.3K/4739.7K                 4526.7K/4330.6K                   4301.1K/4262.4K               3890.8K/3900.1K
      4k     4918.6K/4730.7K                 4556.1K/4343.6K                   4297.6K/4264.5K               3886.9K/3893.9K
      8k     4906.4K/4748.9K                 4550.9K/4346.7K                   4283.2K/4268.8K               3863.4K/3858.2K
      16k    4903.8K/4782.6K                 4501.5K/4233.9K                   4292.3K/4282.3K               3773.1K/3773.5K
      32k    4885.8K/4782.4K                 4365.9K/4184.2K                   4307.5K/4289.4K               3780.3K/3687.3K
      64k    4822.5K/4762.7K                 2752.8K/2675.1K                   4308.8K/4312.3K               2651.5K/2655.7K
      128k   2388.5K/2313.8K                 1391.9K/1375.7K                   2142.8K/2152.2K               1395.5K/1374.2K
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fe631457
    • J
      block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5
      Jens Axboe 提交于
      Wen reports significant memory leaks with DIF and O_DIRECT:
      
      "With nvme devive + T10 enabled, On a system it has 256GB and started
      logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
      it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
      leaking.
      
      /proc/meminfo | grep SUnreclaim...
      
      SUnreclaim:      6752128 kB
      SUnreclaim:      6874880 kB
      SUnreclaim:      7238080 kB
      ....
      SUnreclaim:     22307264 kB
      SUnreclaim:     22485888 kB
      SUnreclaim:     22720256 kB
      
      When testcases with T10 enabled call into __blkdev_direct_IO_simple,
      code doesn't free memory allocated by bio_integrity_alloc. The patch
      fixes the issue. HTX has been run with +60 hours without failure."
      
      Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
      doesn't go through the regular bio free. This means that any ancillary
      data allocated with the bio through the stack is not freed. Hence, we
      can leak the integrity data associated with the bio, if the device is
      using DIF/DIX.
      
      Fix this by providing a bio_uninit() and export it, so that we can use
      it to free this data. Note that this is a minimal fix for this issue.
      Any current user of bio's that are allocated outside of
      bio_alloc_bioset() suffers from this issue, most notably some drivers.
      We will fix those in a more comprehensive patch for 4.13. This also
      means that the commit marked as being fixed by this isn't the real
      culprit, it's just the most obvious one out there.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9ae3b3f5
    • C
      blk-mq: Create hctx for each present CPU · 4b855ad3
      Christoph Hellwig 提交于
      Currently we only create hctx for online CPUs, which can lead to a lot
      of churn due to frequent soft offline / online operations.  Instead
      allocate one for each present CPU to avoid this and dramatically simplify
      the code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4b855ad3
    • C
      blk-mq: Include all present CPUs in the default queue mapping · 5f042e7c
      Christoph Hellwig 提交于
      This way we get a nice distribution independent of the current cpu
      online / offline state.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Link: http://lkml.kernel.org/r/20170626102058.10200-2-hch@lst.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5f042e7c
  14. 28 6月, 2017 8 次提交