1. 29 8月, 2017 3 次提交
  2. 26 8月, 2017 2 次提交
  3. 24 8月, 2017 8 次提交
    • B
      compat_hdio_ioctl: Fix a declaration · 6a934bb8
      Bart Van Assche 提交于
      This patch avoids that sparse reports the following warning messages:
      
      block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
      block/compat_ioctl.c:85:11:    expected unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:85:11:    got void [noderef] <asn:1>*
      block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
      block/compat_ioctl.c:91:21:    expected void const volatile [noderef] <asn:1>*<noident>
      block/compat_ioctl.c:91:21:    got unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:87:53: warning: dereference of noderef expression
      block/compat_ioctl.c:91:21: warning: dereference of noderef expression
      
      Fixes: commit d597580d ("generic ...copy_..._user primitives")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6a934bb8
    • W
      block: remove blk_free_devt in add_partition · 47570848
      weiping zhang 提交于
      put_device(pdev) will call pdev->type->release finally, and blk_free_devt
      has been called in part_release(), so remove it.
      Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      47570848
    • M
      bio-integrity: Fix regression if profile verify_fn is NULL · 97e05463
      Milan Broz 提交于
      In dm-integrity target we register integrity profile that have
      both generate_fn and verify_fn callbacks set to NULL.
      
      This is used if dm-integrity is stacked under a dm-crypt device
      for authenticated encryption (integrity payload contains authentication
      tag and IV seed).
      
      In this case the verification is done through own crypto API
      processing inside dm-crypt; integrity profile is only holder
      of these data. (And memory is owned by dm-crypt as well.)
      
      After the commit (and previous changes)
        Commit 7c20f116
        Author: Christoph Hellwig <hch@lst.de>
        Date:   Mon Jul 3 16:58:43 2017 -0600
      
          bio-integrity: stop abusing bi_end_io
      
      we get this crash:
      
      : BUG: unable to handle kernel NULL pointer dereference at   (null)
      : IP:   (null)
      : *pde = 00000000
      ...
      :
      : Workqueue: kintegrityd bio_integrity_verify_fn
      : task: f48ae180 task.stack: f4b5c000
      : EIP:   (null)
      : EFLAGS: 00210286 CPU: 0
      : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
      : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
      :  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
      : Call Trace:
      :  ? bio_integrity_process+0xe3/0x1e0
      :  bio_integrity_verify_fn+0xea/0x150
      :  process_one_work+0x1c7/0x5c0
      :  worker_thread+0x39/0x380
      :  kthread+0xd6/0x110
      :  ? process_one_work+0x5c0/0x5c0
      :  ? kthread_worker_fn+0x100/0x100
      :  ? kthread_worker_fn+0x100/0x100
      :  ret_from_fork+0x19/0x24
      : Code:  Bad EIP value.
      : EIP:   (null) SS:ESP: 0068:f4b5dea4
      : CR2: 0000000000000000
      
      Patch just skip the whole verify workqueue if verify_fn is set to NULL.
      
      Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
      Signed-off-by: NMilan Broz <gmazyland@gmail.com>
      [hch: trivial whitespace fix]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      97e05463
    • W
      block, bfq: fix error handle in bfq_init · 37dcd657
      weiping zhang 提交于
      if elv_register fail, bfq_pool should be free.
      Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      37dcd657
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
    • C
      block: add a __disk_get_part helper · 807d4af2
      Christoph Hellwig 提交于
      This helper allows looking up a partion under RCU protection without
      grabbing a reference to it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      807d4af2
    • C
    • B
      block: Warn if blk_queue_rq_timed_out() is called for a blk-mq queue · 130d733a
      Bart Van Assche 提交于
      The timeout handler set by blk_queue_rq_timed_out() is only used
      in single queue mode. Calling this function for blk-mq drivers is
      wrong. Hence issue a warning if this function is called by a blk-mq
      driver.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      130d733a
  4. 18 8月, 2017 6 次提交
  5. 11 8月, 2017 3 次提交
    • R
      cfq: Give a chance for arming slice idle timer in case of group_idle · b3193bc0
      Ritesh Harjani 提交于
      In below scenario blkio cgroup does not work as per their assigned
      weights :-
      1. When the underlying device is nonrotational with a single HW queue
      with depth of >= CFQ_HW_QUEUE_MIN
      2. When the use case is forming two blkio cgroups cg1(weight 1000) &
      cg2(wight 100) and two processes(file1 and file2) doing sync IO in
      their respective blkio cgroups.
      
      For above usecase result of fio (without this patch):-
      file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan  1 19:41:49 1970
        write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
      <...>
      file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan  1 19:41:49 1970
        write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
      <...>
      // both the process BW is equal even though they belong to diff.
      cgroups with weight of 1000(cg1) and 100(cg2)
      
      In above case (for non rotational NCQ devices),
      as soon as the request from cg1 is completed and even
      though it is provided with higher set_slice=10, because of CFQ
      algorithm when the driver tries to fetch the request, CFQ expires
      this group without providing any idle time nor weight priority
      and schedules another cfq group (in this case cg2).
      And thus both cfq groups(cg1 & cg2) keep alternating to get the
      disk time and hence loses the cgroup weight based scheduling.
      
      Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
      to arm the slice timer in case group_idle is enabled.
      In case if group_idle is also not required (including for nonrotational
      NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
      such cases.
      
      With this patch result of fio(for above usecase) :-
      file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan  1 00:06:08 1970
        write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
      <..>
      file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan  1 00:06:08 1970
        write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
      <..>
      // In this processes BW is as per their respective cgroups weight.
      Signed-off-by: NRitesh Harjani <riteshh@codeaurora.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b3193bc0
    • P
      block, bfq: boost throughput with flash-based non-queueing devices · edaf9428
      Paolo Valente 提交于
      When a queue associated with a process remains empty, there are cases
      where throughput gets boosted if the device is idled to await the
      arrival of a new I/O request for that queue. Currently, BFQ assumes
      that one of these cases is when the device has no internal queueing
      (regardless of the properties of the I/O being served). Unfortunately,
      this condition has proved to be too general. So, this commit refines it
      as "the device has no internal queueing and is rotational".
      
      This refinement provides a significant throughput boost with random
      I/O, on flash-based storage without internal queueing. For example, on
      a HiKey board, throughput increases by up to 125%, growing, e.g., from
      6.9MB/s to 15.6MB/s with two or three random readers in parallel.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NLuca Miccio <lucmiccio@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      edaf9428
    • P
      block,bfq: refactor device-idling logic · d5be3fef
      Paolo Valente 提交于
      The logic that decides whether to idle the device is scattered across
      three functions. Almost all of the logic is in the function
      bfq_bfqq_may_idle, but (1) part of the decision is made in
      bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
      switch off idling regardless of the output of bfq_bfqq_may_idle. In
      addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
      their decisions as a function of parameters that are used, for similar
      purposes, also in bfq_bfqq_may_idle. This commit addresses these
      issues by moving all the logic into bfq_bfqq_may_idle.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d5be3fef
  6. 10 8月, 2017 7 次提交
  7. 02 8月, 2017 1 次提交
  8. 01 8月, 2017 1 次提交
    • J
      blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled · b7a71e66
      Jens Axboe 提交于
      We recently had a bug in the IPR SCSI driver, where it would end up
      making the SCSI mid layer run the mq hardware queue with interrupts
      disabled. This isn't legal, since the software queue locking relies
      on never being grabbed from interrupt context. Additionally, drivers
      that set BLK_MQ_F_BLOCKING may schedule from this context.
      
      Add a WARN_ON_ONCE() to catch bad users up front.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7a71e66
  9. 29 7月, 2017 3 次提交
  10. 25 7月, 2017 1 次提交
  11. 24 7月, 2017 1 次提交
  12. 12 7月, 2017 2 次提交
    • H
      bfq: dispatch request to prevent queue stalling after the request completion · 3f7cb4f4
      Hou Tao 提交于
      There are mq devices (eg., virtio-blk, nbd and loopback) which don't
      invoke blk_mq_run_hw_queues() after the completion of a request.
      If bfq is enabled on these devices and the slice_idle attribute or
      strict_guarantees attribute is set as zero, it is possible that
      after a request completion the remaining requests of busy bfq queue
      will stalled in the bfq schedule until a new request arrives.
      
      To fix the scheduler latency problem, we need to check whether or not
      all issued requests have completed and dispatch more requests to driver
      if there is no request in driver.
      
      The problem can be reproduced by running the following script
      on a virtio-blk device with nr_hw_queues as 1:
      
      #!/bin/sh
      
      dev=vdb
      # mount point for dev
      mp=/tmp/mnt
      cd $mp
      
      job=strict.job
      cat <<EOF > $job
      [global]
      direct=1
      bs=4k
      size=256M
      rw=write
      ioengine=libaio
      iodepth=128
      runtime=5
      time_based
      
      [1]
      filename=1.data
      
      [2]
      new_group
      filename=2.data
      EOF
      
      echo bfq > /sys/block/$dev/queue/scheduler
      echo 1 > /sys/block/$dev/queue/iosched/strict_guarantees
      fio $job
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f7cb4f4
    • H
      bfq: fix typos in comments about B-WF2Q+ algorithm · 38c91407
      Hou Tao 提交于
      The start time of eligible entity should be less than or equal to
      the current virtual time, and the entity in idle tree has a finish
      time being greater than the current virtual time.
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      38c91407
  13. 11 7月, 2017 1 次提交
    • S
      block: call bio_uninit in bio_endio · b222dd2f
      Shaohua Li 提交于
      bio_free isn't a good place to free cgroup info. There are a
      lot of cases bio is allocated in special way (for example, in stack) and
      never gets called by bio_put hence bio_free, we are leaking memory. This
      patch moves the free to bio endio, which should be called anyway. The
      bio_uninit call in bio_free is kept, in case the bio never gets called
      bio endio.
      
      This assumes ->bi_end_io() doesn't access cgroup info, which seems true
      in my audit.
      
      This along with Christoph's integrity patch should fix the memory leak
      issue.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b222dd2f
  14. 06 7月, 2017 1 次提交
    • D
      block: Fix __blkdev_issue_zeroout loop · 615d22a5
      Damien Le Moal 提交于
      The BIO issuing loop in __blkdev_issue_zeroout() is allocating BIOs
      with a maximum number of bvec (pages) equal to
      
      min(nr_sects, (sector_t)BIO_MAX_PAGES)
      
      This works since the requested number of bvecs will always be limited
      to the absolute maximum number supported (BIO_MAX_PAGES), but this is
      ineficient as too many bvec entries may be requested due to the
      different units being used in the min() operation (number of sectors vs
      number of pages).
      To fix this, introduce the helper __blkdev_sectors_to_bio_pages() to
      correctly calculate the number of bvecs for zeroout BIOs as the issuing
      loop progresses. The calculation is done using consistent units and
      makes sure that the number of pages return is at least 1 (for cases
      where the number of sectors is less that the number of sectors in
      a page).
      
      Also remove a trailing space after the bit shift in the internal loop
      min() call.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      615d22a5