1. 28 6月, 2017 1 次提交
  2. 09 4月, 2017 1 次提交
  3. 09 2月, 2017 1 次提交
  4. 02 2月, 2017 1 次提交
  5. 13 12月, 2016 1 次提交
  6. 01 12月, 2016 1 次提交
  7. 11 11月, 2016 1 次提交
    • J
      block: hook up writeback throttling · 87760e5e
      Jens Axboe 提交于
      Enable throttling of buffered writeback to make it a lot
      more smooth, and has way less impact on other system activity.
      Background writeback should be, by definition, background
      activity. The fact that we flush huge bundles of it at the time
      means that it potentially has heavy impacts on foreground workloads,
      which isn't ideal. We can't easily limit the sizes of writes that
      we do, since that would impact file system layout in the presence
      of delayed allocation. So just throttle back buffered writeback,
      unless someone is waiting for it.
      
      The algorithm for when to throttle takes its inspiration in the
      CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors
      the minimum latencies of requests over a window of time. In that
      window of time, if the minimum latency of any request exceeds a
      given target, then a scale count is incremented and the queue depth
      is shrunk. The next monitoring window is shrunk accordingly. Unlike
      CoDel, if we hit a window that exhibits good behavior, then we
      simply increment the scale count and re-calculate the limits for that
      scale value. This prevents us from oscillating between a
      close-to-ideal value and max all the time, instead remaining in the
      windows where we get good behavior.
      
      Unlike CoDel, blk-wb allows the scale count to to negative. This
      happens if we primarily have writes going on. Unlike positive
      scale counts, this doesn't change the size of the monitoring window.
      When the heavy writers finish, blk-bw quickly snaps back to it's
      stable state of a zero scale count.
      
      The patch registers a sysfs entry, 'wb_lat_usec'. This sets the latency
      target to me met. It defaults to 2 msec for non-rotational storage, and
      75 msec for rotational storage. Setting this value to '0' disables
      blk-wb. Generally, a user would not have to touch this setting.
      
      We don't enable WBT on devices that are managed with CFQ, and have
      a non-root block cgroup attached. If we have a proportional share setup
      on this particular disk, then the wbt throttling will interfere with
      that. We don't have a strong need for wbt for that case, since we will
      rely on CFQ doing that for us.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      87760e5e
  8. 06 11月, 2016 1 次提交
    • J
      block: add code to track actual device queue depth · d278d4a8
      Jens Axboe 提交于
      For blk-mq, ->nr_requests does track queue depth, at least at init
      time. But for the older queue paths, it's simply a soft setting.
      On top of that, it's generally larger than the hardware setting
      on purpose, to allow backup of requests for merging.
      
      Fill a hole in struct request with a 'queue_depth' member, that
      drivers can call to more closely inform the block layer of the
      real queue depth.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      d278d4a8
  9. 19 10月, 2016 2 次提交
  10. 14 4月, 2016 1 次提交
  11. 13 4月, 2016 2 次提交
  12. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  13. 12 2月, 2016 1 次提交
  14. 11 2月, 2016 1 次提交
  15. 26 11月, 2015 1 次提交
    • M
      block/sd: Fix device-imposed transfer length limits · ca369d51
      Martin K. Petersen 提交于
      Commit 4f258a46 ("sd: Fix maximum I/O size for BLOCK_PC requests")
      had the unfortunate side-effect of removing an implicit clamp to
      BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
      code. This caused problems for some SMR drives.
      
      Debugging this issue revealed a few problems with the existing
      infrastructure since the block layer didn't know how to deal with
      device-imposed limits, only limits set by the I/O controller.
      
       - Introduce a new queue limit, max_dev_sectors, which is used by the
         ULD to signal the maximum sectors for a REQ_TYPE_FS request.
      
       - Ensure that max_dev_sectors is correctly stacked and taken into
         account when overriding max_sectors through sysfs.
      
       - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
         values for later processing.
      
       - In sd_revalidate() set the queue's max_dev_sectors based on the
         MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
         is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
         field size.
      
       - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
         VPD--if reported and sane--to signal the preferred device transfer
         size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.
      
       - blk_limits_max_hw_sectors() is no longer used and can be removed.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: sweeneygj@gmx.com
      Tested-by: NArzeets <anatol.pomozov@gmail.com>
      Tested-by: NDavid Eisner <david.eisner@oriel.oxon.org>
      Tested-by: NMario Kicherer <dev@kicherer.org>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ca369d51
  16. 20 8月, 2015 1 次提交
  17. 19 8月, 2015 1 次提交
    • J
      Revert "block: remove artifical max_hw_sectors cap" · 30e2bc08
      Jeff Moyer 提交于
      This reverts commit 34b48db6.
      That commit caused performance regressions for streaming I/O
      workloads on a number of different storage devices, from
      SATA disks to external RAID arrays.  It also managed to
      trip up some buggy firmware in at least one drive, causing
      data corruption.
      
      The next patch will bump the default max_sectors_kb value to
      1280, which will accommodate a 10-data-disk stripe write
      with chunk size 128k.  In the testing I've done using iozone,
      fio, and aio-stress, a value of 1280 does not show a big
      performance difference from 512.  This will hopefully still
      help the software RAID setup that Christoph saw the original
      performance gains with while still not regressing other
      storage configurations.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      30e2bc08
  18. 14 8月, 2015 1 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
  19. 13 8月, 2015 1 次提交
  20. 17 7月, 2015 1 次提交
    • J
      block: make /sys/block/<dev>/queue/discard_max_bytes writeable · 0034af03
      Jens Axboe 提交于
      Lots of devices support huge discard sizes these days. Depending
      on how the device handles them internally, huge discards can
      introduce massive latencies (hundreds of msec) on the device side.
      
      We have a sysfs file, discard_max_bytes, that advertises the max
      hardware supported discard size. Make this writeable, and split
      the settings into a soft and hard limit. This can be set from
      'discard_granularity' and up to the hardware limit.
      
      Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
      set limit.
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0034af03
  21. 31 3月, 2015 1 次提交
    • M
      block: fix blk_stack_limits() regression due to lcm() change · e9637415
      Mike Snitzer 提交于
      Linux 3.19 commit 69c953c8 ("lib/lcm.c: lcm(n,0)=lcm(0,n) is 0, not n")
      caused blk_stack_limits() to not properly stack queue_limits for stacked
      devices (e.g. DM).
      
      Fix this regression by establishing lcm_not_zero() and switching
      blk_stack_limits() over to using it.
      
      DM uses blk_set_stacking_limits() to establish the initial top-level
      queue_limits that are then built up based on underlying devices' limits
      using blk_stack_limits().  In the case of optimal_io_size (io_opt)
      blk_set_stacking_limits() establishes a default value of 0.  With commit
      69c953c8, lcm(0, n) is no longer n, which compromises proper stacking of
      the underlying devices' io_opt.
      
      Test:
      $ modprobe scsi_debug dev_size_mb=10 num_tgts=1 opt_blks=1536
      $ cat /sys/block/sde/queue/optimal_io_size
      786432
      $ dmsetup create node --table "0 100 linear /dev/sde 0"
      
      Before this fix:
      $ cat /sys/block/dm-5/queue/optimal_io_size
      0
      
      After this fix:
      $ cat /sys/block/dm-5/queue/optimal_io_size
      786432
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.19+
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e9637415
  22. 22 10月, 2014 1 次提交
    • C
      block: remove artifical max_hw_sectors cap · 34b48db6
      Christoph Hellwig 提交于
      Set max_sectors to the value the drivers provides as hardware limit by
      default.  Linux had proper I/O throttling for a long time and doesn't
      rely on a artifically small maximum I/O size anymore.  By not limiting
      the I/O size by default we remove an annoying tuning step required for
      most Linux installation.
      
      Note that both the user, and if absolutely required the driver can still
      impose a limit for FS requests below max_hw_sectors_kb.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      34b48db6
  23. 09 10月, 2014 1 次提交
  24. 11 6月, 2014 1 次提交
  25. 06 6月, 2014 1 次提交
    • J
      block: add notion of a chunk size for request merging · 762380ad
      Jens Axboe 提交于
      Some drivers have different limits on what size a request should
      optimally be, depending on the offset of the request. Similar to
      dividing a device into chunks. Add a setting that allows the driver
      to inform the block layer of such a chunk size. The block layer will
      then prevent merging across the chunks.
      
      This is needed to optimally support NVMe with a non-zero stripe size.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      762380ad
  26. 09 1月, 2014 1 次提交
    • K
      bcache/md: Use raid stripe size · c78afc62
      Kent Overstreet 提交于
      Now that we've got code for raid5/6 stripe awareness, bcache just needs
      to know about the stripes and when writing partial stripes is expensive
      - we probably don't want to enable this optimization for raid1 or 10,
      even though they have stripes. So add a flag to queue_limits.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      c78afc62
  27. 09 11月, 2013 1 次提交
  28. 31 10月, 2013 1 次提交
  29. 15 12月, 2012 1 次提交
  30. 20 9月, 2012 1 次提交
  31. 01 8月, 2012 1 次提交
  32. 11 1月, 2012 1 次提交
    • M
      block: Introduce blk_set_stacking_limits function · b1bd055d
      Martin K. Petersen 提交于
      Stacking driver queue limits are typically bounded exclusively by the
      capabilities of the low level devices, not by the stacking driver
      itself.
      
      This patch introduces blk_set_stacking_limits() which has more liberal
      metrics than the default queue limits function. This allows us to
      inherit topology parameters from bottom devices without manually
      tweaking the default limits in each driver prior to calling the stacking
      function.
      
      Since there is now a clear distinction between stacking and low-level
      devices, blk_set_default_limits() has been modified to carry the more
      conservative values that we used to manually set in
      blk_queue_make_request().
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1bd055d
  33. 18 5月, 2011 1 次提交
  34. 07 5月, 2011 1 次提交
  35. 18 4月, 2011 1 次提交
  36. 12 4月, 2011 1 次提交
  37. 10 3月, 2011 1 次提交
  38. 03 3月, 2011 1 次提交
    • V
      block: Initialize ->queue_lock to internal lock at queue allocation time · c94a96ac
      Vivek Goyal 提交于
      There does not seem to be a clear convention whether q->queue_lock is
      initialized or not when blk_cleanup_queue() is called. In the past it
      was not necessary but now blk_throtl_exit() takes up queue lock by
      default and needs queue lock to be available.
      
      In fact elevator_exit() code also has similar requirement just that it
      is less stringent in the sense that elevator_exit() is called only if
      elevator is initialized.
      
      Two problems have been noticed because of ambiguity about spin lock
      status.
      
            - If a driver calls blk_alloc_queue() and then soon calls
              blk_cleanup_queue() almost immediately, (because some other
      	driver structure allocation failed or some other error happened)
      	then blk_throtl_exit() will run into issues as queue lock is not
      	initialized. Loop driver ran into this issue recently and I
      	noticed error paths in md driver too. Similar error paths should
      	exist in other drivers too.
      
            - If some driver provided external spin lock and zapped the lock
              before blk_cleanup_queue(), then it can lead to issues.
      
      So this patch initializes the default queue lock at queue allocation time.
      
      block throttling code is one of the users of queue lock and it is
      initialized at the queue allocation time, so it makes sense to
      initialize ->queue_lock also to internal lock. A driver can overide that
      lock later. This will take care of the issue where a driver does not have
      to worry about initializing the queue lock to default before calling
      blk_cleanup_queue()
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c94a96ac