1. 04 11月, 2016 1 次提交
    • S
      block: immediately dispatch big size request · 50d24c34
      Shaohua Li 提交于
      Currently block plug holds up to 16 non-mergeable requests. This makes
      sense if the request size is small, eg, reduce lock contention. But if
      request size is big enough, we don't need to worry about lock
      contention. Holding such request makes no sense and it lows the disk
      utilization.
      
      In practice, this improves 10% throughput for my raid5 sequential write
      workload.
      
      The size (128k) is arbitrary right now, but it makes sure lock
      contention is small. This probably could be more intelligent, eg, check
      average request size holded. Since this is mainly for sequential IO,
      probably not worthy.
      
      V2: check the last request instead of the first request, so as long as
      there is one big size request we flush the plug.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      50d24c34
  2. 03 11月, 2016 1 次提交
    • B
      blk-mq: Introduce blk_mq_quiesce_queue() · 6a83e74d
      Bart Van Assche 提交于
      blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
      have finished. This function does *not* wait until all outstanding
      requests have finished (this means invocation of request.end_io()).
      The algorithm used by blk_mq_quiesce_queue() is as follows:
      * Hold either an RCU read lock or an SRCU read lock around
        .queue_rq() calls. The former is used if .queue_rq() does not
        block and the latter if .queue_rq() may block.
      * blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
        followed by synchronize_srcu() or synchronize_rcu(). The latter
        call waits for .queue_rq() invocations that started before
        blk_mq_quiesce_queue() was called.
      * The blk_mq_hctx_stopped() calls that control whether or not
        .queue_rq() will be called are called with the (S)RCU read lock
        held. This is necessary to avoid race conditions against
        blk_mq_quiesce_queue().
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMing Lei <tom.leiming@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6a83e74d
  3. 28 10月, 2016 2 次提交
    • C
      block: better op and flags encoding · ef295ecf
      Christoph Hellwig 提交于
      Now that we don't need the common flags to overflow outside the range
      of a 32-bit type we can encode them the same way for both the bio and
      request fields.  This in addition allows us to place the operation
      first (and make some room for more ops while we're at it) and to
      stop having to shift around the operation values.
      
      In addition this allows passing around only one value in the block layer
      instead of two (and eventuall also in the file systems, but we can do
      that later) and thus clean up a lot of code.
      
      Last but not least this allows decreasing the size of the cmd_flags
      field in struct request to 32-bits.  Various functions passing this
      value could also be updated, but I'd like to avoid the churn for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ef295ecf
    • C
      block: split out request-only flags into a new namespace · e8064021
      Christoph Hellwig 提交于
      A lot of the REQ_* flags are only used on struct requests, and only of
      use to the block layer and a few drivers that dig into struct request
      internals.
      
      This patch adds a new req_flags_t rq_flags field to struct request for
      them, and thus dramatically shrinks the number of common requests.  It
      also removes the unfortunate situation where we have to fit the fields
      from the same enum into 32 bits for struct bio and 64 bits for
      struct request.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NShaun Tancheff <shaun.tancheff@seagate.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e8064021
  4. 19 10月, 2016 3 次提交
  5. 15 9月, 2016 1 次提交
    • M
      blk-mq: introduce blk_mq_delay_kick_requeue_list() · 2849450a
      Mike Snitzer 提交于
      blk_mq_delay_kick_requeue_list() provides the ability to kick the
      q->requeue_list after a specified time.  To do this the request_queue's
      'requeue_work' member was changed to a delayed_work.
      
      blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
      requests while it doesn't make sense to immediately requeue them
      (e.g. when all paths in a DM multipath have failed).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2849450a
  6. 29 8月, 2016 1 次提交
  7. 16 8月, 2016 1 次提交
  8. 08 8月, 2016 1 次提交
  9. 05 8月, 2016 2 次提交
  10. 21 7月, 2016 5 次提交
  11. 13 7月, 2016 1 次提交
    • D
      pmem: kill __pmem address space · 7a9eb206
      Dan Williams 提交于
      The __pmem address space was meant to annotate codepaths that touch
      persistent memory and need to coordinate a call to wmb_pmem().  Now that
      wmb_pmem() is gone, there is little need to keep this annotation.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7a9eb206
  12. 28 6月, 2016 1 次提交
    • J
      block: Convert fifo_time from ulong to u64 · 9828c2c6
      Jan Kara 提交于
      Currently rq->fifo_time is unsigned long but CFQ stores nanosecond
      timestamp in it which would overflow on 32-bit archs. Convert it to u64
      to avoid the overflow. Since the rq->fifo_time is unioned with struct
      call_single_data(), this does not change the size of struct request in
      any way.
      
      We have to slightly fixup block/deadline-iosched.c so that comparison
      happens in the right types.
      
      Fixes: 9a7f38c4Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9828c2c6
  13. 09 6月, 2016 2 次提交
  14. 08 6月, 2016 6 次提交
  15. 19 5月, 2016 1 次提交
  16. 17 5月, 2016 3 次提交
    • T
      block: Update blkdev_dax_capable() for consistency · a8078b1f
      Toshi Kani 提交于
      blkdev_dax_capable() is similar to bdev_dax_supported(), but needs
      to remain as a separate interface for checking dax capability of
      a raw block device.
      
      Rename and relocate blkdev_dax_capable() to keep them maintained
      consistently, and call bdev_direct_access() for the dax capability
      check.
      
      There is no change in the behavior.
      
      Link: https://lkml.org/lkml/2016/5/9/950Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      a8078b1f
    • T
      block: Add bdev_dax_supported() for dax mount checks · 2d96afc8
      Toshi Kani 提交于
      DAX imposes additional requirements to a device.  Add
      bdev_dax_supported() which performs all the precondition checks
      necessary for filesystem to mount the device with dax option.
      
      Also add a new check to verify if a partition is aligned by 4KB.
      When a partition is unaligned, any dax read/write access fails,
      except for metadata update.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      2d96afc8
    • T
      block: Add vfs_msg() interface · 2af3a815
      Toshi Kani 提交于
      In preparation of moving DAX capability checks to the block layer
      from filesystem code, add a VFS message interface that aligns with
      filesystem's message format.
      
      For instance, a vfs_msg() message followed by XFS messages in case
      of a dax mount error may look like:
      
        VFS (pmem0p1): error: unaligned partition for dax
        XFS (pmem0p1): DAX unsupported by block device. Turning off DAX.
        XFS (pmem0p1): Mounting V5 Filesystem
         :
      
      vfs_msg() is largely based on ext4_msg().
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      2af3a815
  17. 02 5月, 2016 1 次提交
  18. 14 4月, 2016 1 次提交
  19. 13 4月, 2016 3 次提交
  20. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  21. 05 3月, 2016 1 次提交
  22. 04 3月, 2016 1 次提交