1. 04 5月, 2017 1 次提交
    • O
      blk-mq: untangle debugfs and sysfs · 9c1051aa
      Omar Sandoval 提交于
      Originally, I tied debugfs registration/unregistration together with
      sysfs. There's no reason to do this, and it's getting in the way of
      letting schedulers define their own debugfs attributes. Instead, tie the
      debugfs registration to the lifetime of the structures themselves.
      
      The saner lifetimes mean we can also get rid of the extra mq directory
      and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
      just nvme0n1/hctx0/tags.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9c1051aa
  2. 28 4月, 2017 1 次提交
  3. 21 4月, 2017 4 次提交
  4. 20 4月, 2017 4 次提交
  5. 19 4月, 2017 1 次提交
    • A
      block, bfq: add full hierarchical scheduling and cgroups support · e21b7a0b
      Arianna Avanzini 提交于
      Add complete support for full hierarchical scheduling, with a cgroups
      interface. Full hierarchical scheduling is implemented through the
      'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues
      associated with processes, and groups are represented in general by
      entities. Given the bfq_queues associated with the processes belonging
      to a given group, the entities representing these queues are sons of
      the entity representing the group. At higher levels, if a group, say
      G, contains other groups, then the entity representing G is the parent
      entity of the entities representing the groups in G.
      
      Hierarchical scheduling is performed as follows: if the timestamps of
      a leaf entity (i.e., of a bfq_queue) change, and such a change lets
      the entity become the next-to-serve entity for its parent entity, then
      the timestamps of the parent entity are recomputed as a function of
      the budget of its new next-to-serve leaf entity. If the parent entity
      belongs, in its turn, to a group, and its new timestamps let it become
      the next-to-serve for its parent entity, then the timestamps of the
      latter parent entity are recomputed as well, and so on. When a new
      bfq_queue must be set in service, the reverse path is followed: the
      next-to-serve highest-level entity is chosen, then its next-to-serve
      child entity, and so on, until the next-to-serve leaf entity is
      reached, and the bfq_queue that this entity represents is set in
      service.
      
      Writeback is accounted for on a per-group basis, i.e., for each group,
      the async I/O requests of the processes of the group are enqueued in a
      distinct bfq_queue, and the entity associated with this queue is a
      child of the entity associated with the group.
      
      Weights can be assigned explicitly to groups and processes through the
      cgroups interface, differently from what happens, for single
      processes, if the cgroups interface is not used (as explained in the
      description of the previous patch). In particular, since each node has
      a full scheduler, each group can be assigned its own weight.
      Signed-off-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e21b7a0b
  6. 15 4月, 2017 1 次提交
    • M
      block: fix bio_will_gap() for first bvec with offset · 5a8d75a1
      Ming Lei 提交于
      Commit 729204ef("block: relax check on sg gap") allows us to merge
      bios, if both are physically contiguous.  This change can merge a huge
      number of small bios, through mkfs for example, mkfs.ntfs running time
      can be decreased to ~1/10.
      
      But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
      is non-zero) and if we allow the merge, it is quite difficult to respect
      sg gap limit, especially the max segment size, or we risk having an
      unaligned virtual boundary.  This patch tries to avoid the issue by
      disallowing a merge, if the req starts with an unaligned buffer.
      
      Also add comments to explain why the merged segment can't end in
      unaligned virt boundary.
      
      Fixes: 729204ef ("block: relax check on sg gap")
      Tested-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Rewrote parts of the commit message and comments.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5a8d75a1
  7. 09 4月, 2017 3 次提交
  8. 08 4月, 2017 1 次提交
  9. 06 4月, 2017 2 次提交
  10. 29 3月, 2017 1 次提交
  11. 22 3月, 2017 1 次提交
    • O
      blk-stat: convert to callback-based statistics reporting · 34dbad5d
      Omar Sandoval 提交于
      Currently, statistics are gathered in ~0.13s windows, and users grab the
      statistics whenever they need them. This is not ideal for both in-tree
      users:
      
      1. Writeback throttling wants its own dynamically sized window of
         statistics. Since the blk-stats statistics are reset after every
         window and the wbt windows don't line up with the blk-stats windows,
         wbt doesn't see every I/O.
      2. Polling currently grabs the statistics on every I/O. Again, depending
         on how the window lines up, we may miss some I/Os. It's also
         unnecessary overhead to get the statistics on every I/O; the hybrid
         polling heuristic would be just as happy with the statistics from the
         previous full window.
      
      This reworks the blk-stats infrastructure to be callback-based: users
      register a callback that they want called at a given time with all of
      the statistics from the window during which the callback was active.
      Users can dynamically bucketize the statistics. wbt and polling both
      currently use read vs. write, but polling can be extended to further
      subdivide based on request size.
      
      The callbacks are kept on an RCU list, and each callback has percpu
      stats buffers. There will only be a few users, so the overhead on the
      I/O completion side is low. The stats flushing is also simplified
      considerably: since the timer function is responsible for clearing the
      statistics, we don't have to worry about stale statistics.
      
      wbt is a trivial conversion. After the conversion, the windowing problem
      mentioned above is fixed.
      
      For polling, we register an extra callback that caches the previous
      window's statistics in the struct request_queue for the hybrid polling
      heuristic to use.
      
      Since we no longer have a single stats buffer for the request queue,
      this also removes the sysfs and debugfs stats entries. To replace those,
      we add a debugfs entry for the poll statistics.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      34dbad5d
  12. 09 3月, 2017 1 次提交
  13. 02 3月, 2017 1 次提交
  14. 09 2月, 2017 1 次提交
  15. 03 2月, 2017 1 次提交
  16. 02 2月, 2017 4 次提交
  17. 01 2月, 2017 3 次提交
  18. 28 1月, 2017 3 次提交
  19. 27 1月, 2017 2 次提交
  20. 18 1月, 2017 1 次提交
  21. 14 1月, 2017 1 次提交
  22. 12 1月, 2017 2 次提交