1. 19 6月, 2017 1 次提交
  2. 09 6月, 2017 2 次提交
    • C
      block: switch bios to blk_status_t · 4e4cbee9
      Christoph Hellwig 提交于
      Replace bi_error with a new bi_status to allow for a clear conversion.
      Note that device mapper overloaded bi_error with a private value, which
      we'll have to keep arround at least for now and thus propagate to a
      proper blk_status_t value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4e4cbee9
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  3. 02 6月, 2017 1 次提交
  4. 09 5月, 2017 1 次提交
    • D
      block, dax: move "select DAX" from BLOCK to FS_DAX · ef510424
      Dan Williams 提交于
      For configurations that do not enable DAX filesystems or drivers, do not
      require the DAX core to be built.
      
      Given that the 'direct_access' method has been removed from
      'block_device_operations', we can also go ahead and remove the
      block-related dax helper functions from fs/block_dev.c to
      drivers/dax/super.c. This keeps dax details out of the block layer and
      lets the DAX core be built as a module in the FS_DAX=n case.
      
      Filesystems need to include dax.h to call bdev_dax_supported().
      
      Cc: linux-xfs@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ef510424
  5. 04 5月, 2017 2 次提交
  6. 28 4月, 2017 1 次提交
  7. 26 4月, 2017 1 次提交
  8. 21 4月, 2017 6 次提交
  9. 20 4月, 2017 4 次提交
  10. 19 4月, 2017 1 次提交
    • A
      block, bfq: add full hierarchical scheduling and cgroups support · e21b7a0b
      Arianna Avanzini 提交于
      Add complete support for full hierarchical scheduling, with a cgroups
      interface. Full hierarchical scheduling is implemented through the
      'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues
      associated with processes, and groups are represented in general by
      entities. Given the bfq_queues associated with the processes belonging
      to a given group, the entities representing these queues are sons of
      the entity representing the group. At higher levels, if a group, say
      G, contains other groups, then the entity representing G is the parent
      entity of the entities representing the groups in G.
      
      Hierarchical scheduling is performed as follows: if the timestamps of
      a leaf entity (i.e., of a bfq_queue) change, and such a change lets
      the entity become the next-to-serve entity for its parent entity, then
      the timestamps of the parent entity are recomputed as a function of
      the budget of its new next-to-serve leaf entity. If the parent entity
      belongs, in its turn, to a group, and its new timestamps let it become
      the next-to-serve for its parent entity, then the timestamps of the
      latter parent entity are recomputed as well, and so on. When a new
      bfq_queue must be set in service, the reverse path is followed: the
      next-to-serve highest-level entity is chosen, then its next-to-serve
      child entity, and so on, until the next-to-serve leaf entity is
      reached, and the bfq_queue that this entity represents is set in
      service.
      
      Writeback is accounted for on a per-group basis, i.e., for each group,
      the async I/O requests of the processes of the group are enqueued in a
      distinct bfq_queue, and the entity associated with this queue is a
      child of the entity associated with the group.
      
      Weights can be assigned explicitly to groups and processes through the
      cgroups interface, differently from what happens, for single
      processes, if the cgroups interface is not used (as explained in the
      description of the previous patch). In particular, since each node has
      a full scheduler, each group can be assigned its own weight.
      Signed-off-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e21b7a0b
  11. 15 4月, 2017 1 次提交
    • M
      block: fix bio_will_gap() for first bvec with offset · 5a8d75a1
      Ming Lei 提交于
      Commit 729204ef("block: relax check on sg gap") allows us to merge
      bios, if both are physically contiguous.  This change can merge a huge
      number of small bios, through mkfs for example, mkfs.ntfs running time
      can be decreased to ~1/10.
      
      But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
      is non-zero) and if we allow the merge, it is quite difficult to respect
      sg gap limit, especially the max segment size, or we risk having an
      unaligned virtual boundary.  This patch tries to avoid the issue by
      disallowing a merge, if the req starts with an unaligned buffer.
      
      Also add comments to explain why the merged segment can't end in
      unaligned virt boundary.
      
      Fixes: 729204ef ("block: relax check on sg gap")
      Tested-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Rewrote parts of the commit message and comments.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5a8d75a1
  12. 09 4月, 2017 3 次提交
  13. 08 4月, 2017 1 次提交
  14. 06 4月, 2017 2 次提交
  15. 29 3月, 2017 1 次提交
  16. 22 3月, 2017 1 次提交
    • O
      blk-stat: convert to callback-based statistics reporting · 34dbad5d
      Omar Sandoval 提交于
      Currently, statistics are gathered in ~0.13s windows, and users grab the
      statistics whenever they need them. This is not ideal for both in-tree
      users:
      
      1. Writeback throttling wants its own dynamically sized window of
         statistics. Since the blk-stats statistics are reset after every
         window and the wbt windows don't line up with the blk-stats windows,
         wbt doesn't see every I/O.
      2. Polling currently grabs the statistics on every I/O. Again, depending
         on how the window lines up, we may miss some I/Os. It's also
         unnecessary overhead to get the statistics on every I/O; the hybrid
         polling heuristic would be just as happy with the statistics from the
         previous full window.
      
      This reworks the blk-stats infrastructure to be callback-based: users
      register a callback that they want called at a given time with all of
      the statistics from the window during which the callback was active.
      Users can dynamically bucketize the statistics. wbt and polling both
      currently use read vs. write, but polling can be extended to further
      subdivide based on request size.
      
      The callbacks are kept on an RCU list, and each callback has percpu
      stats buffers. There will only be a few users, so the overhead on the
      I/O completion side is low. The stats flushing is also simplified
      considerably: since the timer function is responsible for clearing the
      statistics, we don't have to worry about stale statistics.
      
      wbt is a trivial conversion. After the conversion, the windowing problem
      mentioned above is fixed.
      
      For polling, we register an extra callback that caches the previous
      window's statistics in the struct request_queue for the hybrid polling
      heuristic to use.
      
      Since we no longer have a single stats buffer for the request queue,
      this also removes the sysfs and debugfs stats entries. To replace those,
      we add a debugfs entry for the poll statistics.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      34dbad5d
  17. 09 3月, 2017 1 次提交
  18. 02 3月, 2017 1 次提交
  19. 09 2月, 2017 1 次提交
  20. 03 2月, 2017 1 次提交
  21. 02 2月, 2017 4 次提交
  22. 01 2月, 2017 3 次提交