1. 27 4月, 2017 8 次提交
  2. 24 4月, 2017 1 次提交
  3. 22 4月, 2017 2 次提交
    • I
      block: get rid of blk_integrity_revalidate() · 19b7ccf8
      Ilya Dryomov 提交于
      Commit 25520d55 ("block: Inline blk_integrity in struct gendisk")
      introduced blk_integrity_revalidate(), which seems to assume ownership
      of the stable pages flag and unilaterally clears it if no blk_integrity
      profile is registered:
      
          if (bi->profile)
                  disk->queue->backing_dev_info->capabilities |=
                          BDI_CAP_STABLE_WRITES;
          else
                  disk->queue->backing_dev_info->capabilities &=
                          ~BDI_CAP_STABLE_WRITES;
      
      It's called from revalidate_disk() and rescan_partitions(), making it
      impossible to enable stable pages for drivers that support partitions
      and don't use blk_integrity: while the call in revalidate_disk() can be
      trivially worked around (see zram, which doesn't support partitions and
      hence gets away with zram_revalidate_disk()), rescan_partitions() can
      be triggered from userspace at any time.  This breaks rbd, where the
      ceph messenger is responsible for generating/verifying CRCs.
      
      Since blk_integrity_{un,}register() "must" be used for (un)registering
      the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
      setting there.  This way drivers that call blk_integrity_register() and
      use integrity infrastructure won't interfere with drivers that don't
      but still want stable pages.
      
      Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.4+, needs backporting
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      19b7ccf8
    • B
      blk-mq: Fix preempt count imbalance · abc25a69
      Bart Van Assche 提交于
      Avoid that the following kernel bug gets triggered:
      
      BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:349
      in_atomic(): 1, irqs_disabled(): 0, pid: 8019, name: find
      CPU: 10 PID: 8019 Comm: find Tainted: G        W I     4.11.0-rc4-dbg+ #2
      Call Trace:
       dump_stack+0x68/0x93
       ___might_sleep+0x16e/0x230
       __might_sleep+0x4a/0x80
       __ext4_get_inode_loc+0x1e0/0x4e0
       ext4_iget+0x70/0xbc0
       ext4_iget_normal+0x2f/0x40
       ext4_lookup+0xb6/0x1f0
       lookup_slow+0x104/0x1e0
       walk_component+0x19a/0x330
       path_lookupat+0x4b/0x100
       filename_lookup+0x9a/0x110
       user_path_at_empty+0x36/0x40
       vfs_statx+0x67/0xc0
       SYSC_newfstatat+0x20/0x40
       SyS_newfstatat+0xe/0x10
       entry_SYSCALL_64_fastpath+0x18/0xad
      
      This happens since the big if/else in blk_mq_make_request() doesn't
      have final else section that also drops the ctx. Add that.
      
      Fixes: b00c53e8 ("blk-mq: fix schedule-while-atomic with scheduler attached")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Omar Sandoval <osandov@fb.com>
      
      Added a bit more to the commit log.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abc25a69
  4. 21 4月, 2017 12 次提交
  5. 20 4月, 2017 9 次提交
  6. 19 4月, 2017 8 次提交
    • J
      block: Make writeback throttling defaults consistent for SQ devices · 8330cdb0
      Jan Kara 提交于
      When CFQ is used as an elevator, it disables writeback throttling
      because they don't play well together. Later when a different elevator
      is chosen for the device, writeback throttling doesn't get enabled
      again as it should. Make sure CFQ enables writeback throttling (if it
      should be enabled by default) when we switch from it to another IO
      scheduler.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8330cdb0
    • P
      block, bfq: split bfq-iosched.c into multiple source files · ea25da48
      Paolo Valente 提交于
      The BFQ I/O scheduler features an optimal fair-queuing
      (proportional-share) scheduling algorithm, enriched with several
      mechanisms to boost throughput and reduce latency for interactive and
      real-time applications. This makes BFQ a large and complex piece of
      code. This commit addresses this issue by splitting BFQ into three
      main, independent components, and by moving each component into a
      separate source file:
      1. Main algorithm: handles the interaction with the kernel, and
      decides which requests to dispatch; it uses the following two further
      components to achieve its goals.
      2. Scheduling engine (Hierarchical B-WF2Q+ scheduling algorithm):
      computes the schedule, using weights and budgets provided by the above
      component.
      3. cgroups support: handles group operations (creation, destruction,
      move, ...).
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ea25da48
    • P
      block, bfq: remove all get and put of I/O contexts · 6fa3e8d3
      Paolo Valente 提交于
      When a bfq queue is set in service and when it is merged, a reference
      to the I/O context associated with the queue is taken. This reference
      is then released when the queue is deselected from service or
      split. More precisely, the release of the reference is postponed to
      when the scheduler lock is released, to avoid nesting between the
      scheduler and the I/O-context lock. In fact, such nesting would lead
      to deadlocks, because of other code paths that take the same locks in
      the opposite order. This postponing of I/O-context releases does
      complicate code.
      
      This commit addresses these issue by modifying involved operations in
      such a way to not need to get the above I/O-context references any
      more. Then it also removes any get and release of these references.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6fa3e8d3
    • A
      block, bfq: handle bursts of queue activations · e1b2324d
      Arianna Avanzini 提交于
      Many popular I/O-intensive services or applications spawn or
      reactivate many parallel threads/processes during short time
      intervals. Examples are systemd during boot or git grep.  These
      services or applications benefit mostly from a high throughput: the
      quicker the I/O generated by their processes is cumulatively served,
      the sooner the target job of these services or applications gets
      completed. As a consequence, it is almost always counterproductive to
      weight-raise any of the queues associated to the processes of these
      services or applications: in most cases it would just lower the
      throughput, mainly because weight-raising also implies device idling.
      
      To address this issue, an I/O scheduler needs, first, to detect which
      queues are associated with these services or applications. In this
      respect, we have that, from the I/O-scheduler standpoint, these
      services or applications cause bursts of activations, i.e.,
      activations of different queues occurring shortly after each
      other. However, a shorter burst of activations may be caused also by
      the start of an application that does not consist in a lot of parallel
      I/O-bound threads (see the comments on the function bfq_handle_burst
      for details).
      
      In view of these facts, this commit introduces:
      1) an heuristic to detect (only) bursts of queue activations caused by
         services or applications consisting in many parallel I/O-bound
         threads;
      2) the prevention of device idling and weight-raising for the queues
         belonging to these bursts.
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e1b2324d
    • P
      block, bfq: boost the throughput with random I/O on NCQ-capable HDDs · e01eff01
      Paolo Valente 提交于
      This patch is basically the counterpart, for NCQ-capable rotational
      devices, of the previous patch. Exactly as the previous patch does on
      flash-based devices and for any workload, this patch disables device
      idling on rotational devices, but only for random I/O. In fact, only
      with these queues disabling idling boosts the throughput on
      NCQ-capable rotational devices. To not break service guarantees,
      idling is disabled for NCQ-enabled rotational devices only when the
      same symmetry conditions considered in the previous patches hold.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e01eff01
    • P
      block, bfq: boost the throughput on NCQ-capable flash-based devices · bf2b79e7
      Paolo Valente 提交于
      This patch boosts the throughput on NCQ-capable flash-based devices,
      while still preserving latency guarantees for interactive and soft
      real-time applications. The throughput is boosted by just not idling
      the device when the in-service queue remains empty, even if the queue
      is sync and has a non-null idle window. This helps to keep the drive's
      internal queue full, which is necessary to achieve maximum
      performance. This solution to boost the throughput is a port of
      commits a68bbddb and f7d7b7a7 for CFQ.
      
      As already highlighted in a previous patch, allowing the device to
      prefetch and internally reorder requests trivially causes loss of
      control on the request service order, and hence on service guarantees.
      Fortunately, as discussed in detail in the comments on the function
      bfq_bfqq_may_idle(), if every process has to receive the same
      fraction of the throughput, then the service order enforced by the
      internal scheduler of a flash-based device is relatively close to that
      enforced by BFQ. In particular, it is close enough to let service
      guarantees be substantially preserved.
      
      Things change in an asymmetric scenario, i.e., if not every process
      has to receive the same fraction of the throughput. In this case, to
      guarantee the desired throughput distribution, the device must be
      prevented from prefetching requests. This is exactly what this patch
      does in asymmetric scenarios.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bf2b79e7
    • A
      block, bfq: reduce idling only in symmetric scenarios · 1de0c4cd
      Arianna Avanzini 提交于
      A seeky queue (i..e, a queue containing random requests) is assigned a
      very small device-idling slice, for throughput issues. Unfortunately,
      given the process associated with a seeky queue, this behavior causes
      the following problem: if the process, say P, performs sync I/O and
      has a higher weight than some other processes doing I/O and associated
      with non-seeky queues, then BFQ may fail to guarantee to P its
      reserved share of the throughput. The reason is that idling is key
      for providing service guarantees to processes doing sync I/O [1].
      
      This commit addresses this issue by allowing the device-idling slice
      to be reduced for a seeky queue only if the scenario happens to be
      symmetric, i.e., if all the queues are to receive the same share of
      the throughput.
      
      [1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
          Scheduler", Proceedings of the First Workshop on Mobile System
          Technologies (MST-2015), May 2015.
          http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdfSigned-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NRiccardo Pizzetti <riccardo.pizzetti@gmail.com>
      Signed-off-by: NSamuele Zecchini <samuele.zecchini92@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1de0c4cd
    • A
      block, bfq: add Early Queue Merge (EQM) · 36eca894
      Arianna Avanzini 提交于
      A set of processes may happen to perform interleaved reads, i.e.,
      read requests whose union would give rise to a sequential read pattern.
      There are two typical cases: first, processes reading fixed-size chunks
      of data at a fixed distance from each other; second, processes reading
      variable-size chunks at variable distances. The latter case occurs for
      example with QEMU, which splits the I/O generated by a guest into
      multiple chunks, and lets these chunks be served by a pool of I/O
      threads, iteratively assigning the next chunk of I/O to the first
      available thread. CFQ denotes as 'cooperating' a set of processes that
      are doing interleaved I/O, and when it detects cooperating processes,
      it merges their queues to obtain a sequential I/O pattern from the union
      of their I/O requests, and hence boost the throughput.
      
      Unfortunately, in the following frequent case, the mechanism
      implemented in CFQ for detecting cooperating processes and merging
      their queues is not responsive enough to handle also the fluctuating
      I/O pattern of the second type of processes. Suppose that one process
      of the second type issues a request close to the next request to serve
      of another process of the same type. At that time the two processes
      would be considered as cooperating. But, if the request issued by the
      first process is to be merged with some other already-queued request,
      then, from the moment at which this request arrives, to the moment
      when CFQ controls whether the two processes are cooperating, the two
      processes are likely to be already doing I/O in distant zones of the
      disk surface or device memory.
      
      CFQ uses however preemption to get a sequential read pattern out of
      the read requests performed by the second type of processes too.  As a
      consequence, CFQ uses two different mechanisms to achieve the same
      goal: boosting the throughput with interleaved I/O.
      
      This patch introduces Early Queue Merge (EQM), a unified mechanism to
      get a sequential read pattern with both types of processes. The main
      idea is to immediately check whether a newly-arrived request lets some
      pair of processes become cooperating, both in the case of actual
      request insertion and, to be responsive with the second type of
      processes, in the case of request merge. Both types of processes are
      then handled by just merging their queues.
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NMauro Andreolini <mauro.andreolini@unimore.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      36eca894