1. 21 10月, 2021 7 次提交
  2. 20 10月, 2021 19 次提交
  3. 19 10月, 2021 14 次提交
    • J
      block: attempt direct issue of plug list · dc5fc361
      Jens Axboe 提交于
      If we have just one queue type in the plug list, then we can extend our
      direct issue to cover a full plug list as well. This allows sending a
      batch of requests for direct issue, which is more efficient than doing
      one-at-a-time kind of issue.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dc5fc361
    • J
      block: change plugging to use a singly linked list · bc490f81
      Jens Axboe 提交于
      Use a singly linked list for the blk_plug. This saves 8 bytes in the
      blk_plug struct, and makes for faster list manipulations than doubly
      linked lists. As we don't use the doubly linked lists for anything,
      singly linked is just fine.
      
      This yields a bump in default (merging enabled) performance from 7.0
      to 7.1M IOPS, and ~7.5M IOPS with merging disabled.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bc490f81
    • A
      blk-wbt: prevent NULL pointer dereference in wb_timer_fn · 480d42dc
      Andrea Righi 提交于
      The timer callback used to evaluate if the latency is exceeded can be
      executed after the corresponding disk has been released, causing the
      following NULL pointer dereference:
      
      [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098
      [ 119.987617] #PF: supervisor read access in kernel mode
      [ 119.987971] #PF: error_code(0x0000) - not-present page
      [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0
      [ 119.988697] Oops: 0000 [#1] SMP NOPTI
      [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi
      [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0
      [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
      [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246
      [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780
      [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000
      [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500
      [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00
      [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000
      [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0
      [ 119.995906] Call Trace:
      [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30
      [ 119.996505] blk_stat_timer_fn+0x138/0x140
      [ 119.996830] call_timer_fn+0x2b/0x100
      [ 119.997136] __run_timers.part.0+0x1d1/0x240
      [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20
      [ 119.997826] ? ktime_get+0x3e/0xa0
      [ 119.998110] ? native_apic_msr_write+0x2c/0x30
      [ 119.998456] ? lapic_next_event+0x20/0x30
      [ 119.998779] ? clockevents_program_event+0x94/0xf0
      [ 119.999150] run_timer_softirq+0x2a/0x50
      [ 119.999465] __do_softirq+0xcb/0x26f
      [ 119.999764] irq_exit_rcu+0x8c/0xb0
      [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90
      [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
      [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      In this case simply return from the timer callback (no action
      required) to prevent the NULL pointer dereference.
      
      BugLink: https://bugs.launchpad.net/bugs/1947557
      Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/
      Fixes: 34dbad5d ("blk-stat: convert to callback-based statistics reporting")
      Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktopSigned-off-by: NJens Axboe <axboe@kernel.dk>
      480d42dc
    • J
      block: align blkdev_dio inlined bio to a cacheline · 6155631a
      Jens Axboe 提交于
      We get all sorts of unreliable and funky results since the bio is
      designed to align on a cacheline, which it does not when inlined like
      this.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6155631a
    • J
      block: move blk_mq_tag_to_rq() inline · e028f167
      Jens Axboe 提交于
      This is in the fast path of driver issue or completion, and it's a single
      array index operation. Move it inline to avoid a function call for it.
      
      This does mean making struct blk_mq_tags block layer public, but there's
      not really much in there.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e028f167
    • J
      block: get rid of plug list sorting · df87eb0f
      Jens Axboe 提交于
      Even if we have multiple queues in the plug list, chances that they
      are very interspersed is minimal. Don't bother spending CPU cycles
      sorting the list.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      df87eb0f
    • J
      block: return whether or not to unplug through boolean · 87c037d1
      Jens Axboe 提交于
      Instead of returning the same queue request through a request pointer,
      use a boolean to accomplish the same.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      87c037d1
    • C
      block: don't call blk_status_to_errno in blk_update_request · 8a7d267b
      Christoph Hellwig 提交于
      We only need to call it to resolve the blk_status_t -> errno mapping for
      tracing, so move the conversion into the tracepoints that are not called
      at all when tracing isn't enabled.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a7d267b
    • J
      block: move bdev_read_only() into the header · db9a02ba
      Jens Axboe 提交于
      This is called for every write in the fast path, move it inline next
      to get_disk_ro() which is called internally.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      db9a02ba
    • J
      block: fix too broad elevator check in blk_mq_free_request() · e0d78afe
      Jens Axboe 提交于
      We added RQF_ELV to tell whether there's an IO scheduler attached, and
      RQF_ELVPRIV tells us whether there's an IO scheduler with private data
      attached. Don't check RQF_ELV in blk_mq_free_request(), what we care
      about here is just if we have scheduler private data attached.
      
      This fixes a boot crash
      
      Fixes: 2ff0682d ("block: store elevator state in request")
      Reported-by: NYi Zhang <yi.zhang@redhat.com>
      Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e0d78afe
    • J
      nvme: wire up completion batching for the IRQ path · 4f502245
      Jens Axboe 提交于
      Trivial to do now, just need our own io_comp_batch on the stack and pass
      that in to the usual command completion handling.
      
      I pondered making this dependent on how many entries we had to process,
      but even for a single entry there's no discernable difference in
      performance or latency. Running a sync workload over io_uring:
      
      t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
      
      yields the below performance before the patch:
      
      IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      
      and the following after:
      
      IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      
      which definitely isn't slower, about the same if you factor in a bit of
      variance. For peak performance workloads, benchmarking shows a 2%
      improvement.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4f502245
    • J
      io_uring: utilize the io batching infrastructure for more efficient polled IO · b688f11e
      Jens Axboe 提交于
      Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
      supports it, we can handle high rates of polled IO more efficiently.
      
      This raises the single core efficiency on my system from ~6.1M IOPS to
      ~6.6M IOPS running a random read workload at depth 128 on two gen2
      Optane drives.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b688f11e
    • J
      nvme: add support for batched completion of polled IO · c234a653
      Jens Axboe 提交于
      Take advantage of struct io_comp_batch, if passed in to the nvme poll
      handler. If it's set, rather than complete each request individually
      inline, store them in the io_comp_batch list. We only do so for requests
      that will complete successfully, anything else will be completed inline as
      before.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c234a653
    • J
      block: add support for blk_mq_end_request_batch() · f794f335
      Jens Axboe 提交于
      Instead of calling blk_mq_end_request() on a single request, add a helper
      that takes the new struct io_comp_batch and completes any request stored
      in there.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f794f335