1. 26 6月, 2022 1 次提交
  2. 22 3月, 2022 1 次提交
  3. 08 2月, 2022 2 次提交
  4. 28 1月, 2022 1 次提交
  5. 14 1月, 2022 1 次提交
  6. 26 10月, 2021 1 次提交
    • J
      sbitmap: silence data race warning · 9f8b93a7
      Jens Axboe 提交于
      KCSAN complaints about the sbitmap hint update:
      
      ==================================================================
      BUG: KCSAN: data-race in sbitmap_queue_clear / sbitmap_queue_clear
      
      write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 1:
       sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
       blk_mq_put_tag+0x82/0x90
       __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
       blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
       __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
       blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
       lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
       blk_complete_reqs block/blk-mq.c:584 [inline]
       blk_done_softirq+0x69/0x90 block/blk-mq.c:589
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
       smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
       kthread+0x262/0x280 kernel/kthread.c:319
       ret_from_fork+0x1f/0x30
      
      write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 0:
       sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
       blk_mq_put_tag+0x82/0x90
       __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
       blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
       __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
       blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
       lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
       blk_complete_reqs block/blk-mq.c:584 [inline]
       blk_done_softirq+0x69/0x90 block/blk-mq.c:589
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
       smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
       kthread+0x262/0x280 kernel/kthread.c:319
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000035 -> 0x00000044
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.15.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      ==================================================================
      
      which is a data race, but not an important one. This is just updating the
      percpu alloc hint, and the reader of that hint doesn't ever require it to
      be valid.
      
      Just annotate it with data_race() to silence this one.
      
      Reported-by: syzbot+4f8bfd804b4a1f95b8f6@syzkaller.appspotmail.com
      Acked-by: NMarco Elver <elver@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9f8b93a7
  7. 19 10月, 2021 1 次提交
  8. 18 10月, 2021 1 次提交
    • J
      sbitmap: add __sbitmap_queue_get_batch() · 9672b0d4
      Jens Axboe 提交于
      The block layer tag allocation batching still calls into sbitmap to get
      each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
      which returns a mask of tags all at once, along with an offset for
      those tags.
      
      An example return would be 0xff, where bits 0..7 are set, with
      tag_offset == 128. The valid tags in this case would be 128..135.
      
      A batch is specific to an individual sbitmap_map, hence it cannot be
      larger than that. The requested number of tags is automatically reduced
      to the max that can be satisfied with a single map.
      
      On failure, 0 is returned. Caller should fall back to single tag
      allocation at that point/
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9672b0d4
  9. 09 7月, 2021 1 次提交
  10. 05 3月, 2021 5 次提交
  11. 08 12月, 2020 4 次提交
  12. 02 7月, 2020 1 次提交
  13. 21 12月, 2019 1 次提交
    • D
      sbitmap: only queue kyber's wait callback if not already active · df034c93
      David Jeffery 提交于
      Under heavy loads where the kyber I/O scheduler hits the token limits for
      its scheduling domains, kyber can become stuck.  When active requests
      complete, kyber may not be woken up leaving the I/O requests in kyber
      stuck.
      
      This stuck state is due to a race condition with kyber and the sbitmap
      functions it uses to run a callback when enough requests have completed.
      The running of a sbt_wait callback can race with the attempt to insert the
      sbt_wait.  Since sbitmap_del_wait_queue removes the sbt_wait from the list
      first then sets the sbq field to NULL, kyber can see the item as not on a
      list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
      results in the sbt_wait being inserted onto the wait list but ws_active
      doesn't get incremented.  So the sbitmap queue does not know there is a
      waiter on a wait list.
      
      Since sbitmap doesn't think there is a waiter, kyber may never be
      informed that there are domain tokens available and the I/O never advances.
      With the sbt_wait on a wait list, kyber believes it has an active waiter
      so cannot insert a new waiter when reaching the domain's full state.
      
      This race can be fixed by only adding the sbt_wait to the queue if the
      sbq field is NULL.  If sbq is not NULL, there is already an action active
      which will trigger the re-running of kyber.  Let it run and add the
      sbt_wait to the wait list if still needing to wait.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Reported-by: NJohn Pittman <jpittman@redhat.com>
      Tested-by: NJohn Pittman <jpittman@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      df034c93
  14. 14 11月, 2019 1 次提交
  15. 02 7月, 2019 1 次提交
  16. 05 6月, 2019 1 次提交
  17. 24 5月, 2019 1 次提交
  18. 26 3月, 2019 1 次提交
    • M
      sbitmap: order READ/WRITE freed instance and setting clear bit · e6d1fa58
      Ming Lei 提交于
      Inside sbitmap_queue_clear(), once the clear bit is set, it will be
      visiable to allocation path immediately. Meantime READ/WRITE on old
      associated instance(such as request in case of blk-mq) may be
      out-of-order with the setting clear bit, so race with re-allocation
      may be triggered.
      
      Adds one memory barrier for ordering READ/WRITE of the freed associated
      instance with setting clear bit for avoiding race with re-allocation.
      
      The following kernel oops triggerd by block/006 on aarch64 may be fixed:
      
      [  142.330954] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330
      [  142.338794] Mem abort info:
      [  142.341554]   ESR = 0x96000005
      [  142.344632]   Exception class = DABT (current EL), IL = 32 bits
      [  142.350500]   SET = 0, FnV = 0
      [  142.353544]   EA = 0, S1PTW = 0
      [  142.356678] Data abort info:
      [  142.359528]   ISV = 0, ISS = 0x00000005
      [  142.363343]   CM = 0, WnR = 0
      [  142.366305] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000002a3c51c0
      [  142.372983] [0000000000000330] pgd=0000000000000000, pud=0000000000000000
      [  142.379777] Internal error: Oops: 96000005 [#1] SMP
      [  142.384613] Modules linked in: null_blk ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core sbsa_gwdt crct10dif_ce ghash_ce ipmi_ssif sha2_ce ipmi_devintf sha256_arm64 sg sha1_ce ipmi_msghandler ip_tables xfs libcrc32c mlx5_core sdhci_acpi mlxfw ahci_platform at803x sdhci libahci_platform qcom_emac mmc_core hdma hdma_mgmt i2c_dev [last unloaded: null_blk]
      [  142.429753] CPU: 7 PID: 1983 Comm: fio Not tainted 5.0.0.cki #2
      [  142.449458] pstate: 00400005 (nzcv daif +PAN -UAO)
      [  142.454239] pc : __blk_mq_free_request+0x4c/0xa8
      [  142.458830] lr : blk_mq_free_request+0xec/0x118
      [  142.463344] sp : ffff00003360f6a0
      [  142.466646] x29: ffff00003360f6a0 x28: ffff000010e70000
      [  142.471941] x27: ffff801729a50048 x26: 0000000000010000
      [  142.477232] x25: ffff00003360f954 x24: ffff7bdfff021440
      [  142.482529] x23: 0000000000000000 x22: 00000000ffffffff
      [  142.487830] x21: ffff801729810000 x20: 0000000000000000
      [  142.493123] x19: ffff801729a50000 x18: 0000000000000000
      [  142.498413] x17: 0000000000000000 x16: 0000000000000001
      [  142.503709] x15: 00000000000000ff x14: ffff7fe000000000
      [  142.509003] x13: ffff8017dcde09a0 x12: 0000000000000000
      [  142.514308] x11: 0000000000000001 x10: 0000000000000008
      [  142.519597] x9 : ffff8017dcde09a0 x8 : 0000000000002000
      [  142.524889] x7 : ffff8017dcde0a00 x6 : 000000015388f9be
      [  142.530187] x5 : 0000000000000001 x4 : 0000000000000000
      [  142.535478] x3 : 0000000000000000 x2 : 0000000000000000
      [  142.540777] x1 : 0000000000000001 x0 : ffff00001041b194
      [  142.546071] Process fio (pid: 1983, stack limit = 0x000000006460a0ea)
      [  142.552500] Call trace:
      [  142.554926]  __blk_mq_free_request+0x4c/0xa8
      [  142.559181]  blk_mq_free_request+0xec/0x118
      [  142.563352]  blk_mq_end_request+0xfc/0x120
      [  142.567444]  end_cmd+0x3c/0xa8 [null_blk]
      [  142.571434]  null_complete_rq+0x20/0x30 [null_blk]
      [  142.576194]  blk_mq_complete_request+0x108/0x148
      [  142.580797]  null_handle_cmd+0x1d4/0x718 [null_blk]
      [  142.585662]  null_queue_rq+0x60/0xa8 [null_blk]
      [  142.590171]  blk_mq_try_issue_directly+0x148/0x280
      [  142.594949]  blk_mq_try_issue_list_directly+0x9c/0x108
      [  142.600064]  blk_mq_sched_insert_requests+0xb0/0xd0
      [  142.604926]  blk_mq_flush_plug_list+0x16c/0x2a0
      [  142.609441]  blk_flush_plug_list+0xec/0x118
      [  142.613608]  blk_finish_plug+0x3c/0x4c
      [  142.617348]  blkdev_direct_IO+0x3b4/0x428
      [  142.621336]  generic_file_read_iter+0x84/0x180
      [  142.625761]  blkdev_read_iter+0x50/0x78
      [  142.629579]  aio_read.isra.6+0xf8/0x190
      [  142.633409]  __io_submit_one.isra.8+0x148/0x738
      [  142.637912]  io_submit_one.isra.9+0x88/0xb8
      [  142.642078]  __arm64_sys_io_submit+0xe0/0x238
      [  142.646428]  el0_svc_handler+0xa0/0x128
      [  142.650238]  el0_svc+0x8/0xc
      [  142.653104] Code: b9402a63 f9000a7f 3100047f 540000a0 (f9419a81)
      [  142.659202] ---[ end trace 467586bc175eb09d ]---
      
      Fixes: ea86ea2c ("sbitmap: ammortize cost of clearing bits")
      Reported-and-bisected_and_tested-by: Yi Zhang <yi.zhang@redhat.com>
      Cc: Yi Zhang <yi.zhang@redhat.com>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6d1fa58
  19. 15 1月, 2019 2 次提交
    • M
      sbitmap: Protect swap_lock from hardirq · fe76fc6a
      Ming Lei 提交于
      Because we may call blk_mq_get_driver_tag() directly from
      blk_mq_dispatch_rq_list() without holding any lock, then HARDIRQ may
      come and the above DEADLOCK is triggered.
      
      Commit ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") tries to
      fix this issue by using 'spin_lock_bh', which isn't enough because we
      complete request from hardirq context direclty in case of multiqueue.
      
      Cc: Clark Williams <williams@redhat.com>
      Fixes: ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq")
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe76fc6a
    • S
      sbitmap: Protect swap_lock from softirqs · 37198768
      Steven Rostedt (VMware) 提交于
      The swap_lock used by sbitmap has a chain with locks taken from softirq,
      but the swap_lock is not protected from being preempted by softirqs.
      
      A chain exists of:
      
       sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock
      
      Where the sbq->ws[i].wait lock can be taken from softirq context, which
      means all locks below it in the chain must also be protected from
      softirqs.
      Reported-by: NClark Williams <williams@redhat.com>
      Fixes: 58ab5e32 ("sbitmap: silence bogus lockdep IRQ warning")
      Fixes: ea86ea2c ("sbitmap: amortize cost of clearing bits")
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37198768
  20. 21 12月, 2018 1 次提交
  21. 12 12月, 2018 1 次提交
  22. 10 12月, 2018 1 次提交
    • J
      sbitmap: silence bogus lockdep IRQ warning · 58ab5e32
      Jens Axboe 提交于
      Ming reports that lockdep spews the following trace. What this
      essentially says is that the sbitmap swap_lock was used inconsistently
      in IRQ enabled and disabled context, and that is usually indicative of a
      bug that will cause a deadlock.
      
      For this case, it's a false positive. The swap_lock is used from process
      context only, when we swap the bits in the word and cleared mask. We
      also end up doing that when we are getting a driver tag, from the
      blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with
      IRQs disabled. However, this isn't from an actual IRQ, it's still
      process context.
      
      In lieu of a better way to fix this, simply always disable interrupts
      when grabbing the swap_lock if lockdep is enabled.
      
      [  100.967642] ================start test sanity/001================
      [  101.238280] null: module loaded
      [  106.093735]
      [  106.094012] =====================================================
      [  106.094854] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [  106.095759] 4.20.0-rc3_5d2ee712_for-next+ #1 Not tainted
      [  106.096551] -----------------------------------------------------
      [  106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [  106.098231] 000000004c43fa71
      (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c
      [  106.099431]
      [  106.099431] and this task is already holding:
      [  106.100229] 000000007eec8b2f
      (&(&hctx->dispatch_wait_lock)->rlock){....}, at:
      blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.101630] which would create a new lock dependency:
      [  106.102326]  (&(&hctx->dispatch_wait_lock)->rlock){....} ->
      (&(&sb->map[i].swap_lock)->rlock){+.+.}
      [  106.103553]
      [  106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock:
      [  106.104580]  (&sbq->ws[i].wait){..-.}
      [  106.104582]
      [  106.104582] ... which became SOFTIRQ-irq-safe at:
      [  106.105751]   _raw_spin_lock_irqsave+0x4b/0x82
      [  106.106284]   __wake_up_common_lock+0x119/0x1b9
      [  106.106825]   sbitmap_queue_wake_up+0x33f/0x383
      [  106.107456]   sbitmap_queue_clear+0x4c/0x9a
      [  106.108046]   __blk_mq_free_request+0x188/0x1d3
      [  106.108581]   blk_mq_free_request+0x23b/0x26b
      [  106.109102]   scsi_end_request+0x345/0x5d7
      [  106.109587]   scsi_io_completion+0x4b5/0x8f0
      [  106.110099]   scsi_finish_command+0x412/0x456
      [  106.110615]   scsi_softirq_done+0x23f/0x29b
      [  106.111115]   blk_done_softirq+0x2a7/0x2e6
      [  106.111608]   __do_softirq+0x360/0x6ad
      [  106.112062]   run_ksoftirqd+0x2f/0x5b
      [  106.112499]   smpboot_thread_fn+0x3a5/0x3db
      [  106.113000]   kthread+0x1d4/0x1e4
      [  106.113457]   ret_from_fork+0x3a/0x50
      [  106.113969]
      [  106.113969] to a SOFTIRQ-irq-unsafe lock:
      [  106.114672]  (&(&sb->map[i].swap_lock)->rlock){+.+.}
      [  106.114674]
      [  106.114674] ... which became SOFTIRQ-irq-unsafe at:
      [  106.116000] ...
      [  106.116003]   _raw_spin_lock+0x33/0x64
      [  106.116676]   sbitmap_get+0xd5/0x22c
      [  106.117134]   __sbitmap_queue_get+0xe8/0x177
      [  106.117731]   __blk_mq_get_tag+0x1e6/0x22d
      [  106.118286]   blk_mq_get_tag+0x1db/0x6e4
      [  106.118756]   blk_mq_get_driver_tag+0x161/0x258
      [  106.119383]   blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.120043]   blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.120607]   blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.121234]   __blk_mq_run_hw_queue+0x137/0x17e
      [  106.121781]   __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.122366]   blk_mq_run_hw_queue+0x151/0x187
      [  106.122887]   blk_mq_sched_insert_requests+0x13f/0x175
      [  106.123492]   blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.124042]   blk_flush_plug_list+0x392/0x3d7
      [  106.124557]   blk_finish_plug+0x37/0x4f
      [  106.125019]   read_pages+0x3ef/0x430
      [  106.125446]   __do_page_cache_readahead+0x18e/0x2fc
      [  106.126027]   force_page_cache_readahead+0x121/0x133
      [  106.126621]   page_cache_sync_readahead+0x35f/0x3bb
      [  106.127229]   generic_file_buffered_read+0x410/0x1860
      [  106.127932]   __vfs_read+0x319/0x38f
      [  106.128415]   vfs_read+0xd2/0x19a
      [  106.128817]   ksys_read+0xb9/0x135
      [  106.129225]   do_syscall_64+0x140/0x385
      [  106.129684]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.130292]
      [  106.130292] other info that might help us debug this:
      [  106.130292]
      [  106.131226] Chain exists of:
      [  106.131226]   &sbq->ws[i].wait -->
      &(&hctx->dispatch_wait_lock)->rlock -->
      &(&sb->map[i].swap_lock)->rlock
      [  106.131226]
      [  106.132865]  Possible interrupt unsafe locking scenario:
      [  106.132865]
      [  106.133659]        CPU0                    CPU1
      [  106.134194]        ----                    ----
      [  106.134733]   lock(&(&sb->map[i].swap_lock)->rlock);
      [  106.135318]                                local_irq_disable();
      [  106.136014]                                lock(&sbq->ws[i].wait);
      [  106.136747]
      lock(&(&hctx->dispatch_wait_lock)->rlock);
      [  106.137742]   <Interrupt>
      [  106.138110]     lock(&sbq->ws[i].wait);
      [  106.138625]
      [  106.138625]  *** DEADLOCK ***
      [  106.138625]
      [  106.139430] 3 locks held by fio/1043:
      [  106.139947]  #0: 0000000076ff0fd9 (rcu_read_lock){....}, at:
      hctx_lock+0x29/0xe8
      [  106.140813]  #1: 000000002feb1016 (&sbq->ws[i].wait){..-.}, at:
      blk_mq_dispatch_rq_list+0x4ad/0xd7c
      [  106.141877]  #2: 000000007eec8b2f
      (&(&hctx->dispatch_wait_lock)->rlock){....}, at:
      blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.143267]
      [  106.143267] the dependencies between SOFTIRQ-irq-safe lock and the
      holding lock:
      [  106.144351]  -> (&sbq->ws[i].wait){..-.} ops: 82 {
      [  106.144926]     IN-SOFTIRQ-W at:
      [  106.145314]                       _raw_spin_lock_irqsave+0x4b/0x82
      [  106.146042]                       __wake_up_common_lock+0x119/0x1b9
      [  106.146785]                       sbitmap_queue_wake_up+0x33f/0x383
      [  106.147567]                       sbitmap_queue_clear+0x4c/0x9a
      [  106.148379]                       __blk_mq_free_request+0x188/0x1d3
      [  106.149148]                       blk_mq_free_request+0x23b/0x26b
      [  106.149864]                       scsi_end_request+0x345/0x5d7
      [  106.150546]                       scsi_io_completion+0x4b5/0x8f0
      [  106.151367]                       scsi_finish_command+0x412/0x456
      [  106.152157]                       scsi_softirq_done+0x23f/0x29b
      [  106.152855]                       blk_done_softirq+0x2a7/0x2e6
      [  106.153537]                       __do_softirq+0x360/0x6ad
      [  106.154280]                       run_ksoftirqd+0x2f/0x5b
      [  106.155020]                       smpboot_thread_fn+0x3a5/0x3db
      [  106.155828]                       kthread+0x1d4/0x1e4
      [  106.156526]                       ret_from_fork+0x3a/0x50
      [  106.157267]     INITIAL USE at:
      [  106.157713]                      _raw_spin_lock_irqsave+0x4b/0x82
      [  106.158542]                      prepare_to_wait_exclusive+0xa8/0x215
      [  106.159421]                      blk_mq_get_tag+0x34f/0x6e4
      [  106.160186]                      blk_mq_get_request+0x48e/0xaef
      [  106.160997]                      blk_mq_make_request+0x27e/0xbd2
      [  106.161828]                      generic_make_request+0x4d1/0x873
      [  106.162661]                      submit_bio+0x20c/0x253
      [  106.163379]                      mpage_bio_submit+0x44/0x4b
      [  106.164142]                      mpage_readpages+0x3c2/0x407
      [  106.164919]                      read_pages+0x13a/0x430
      [  106.165633]                      __do_page_cache_readahead+0x18e/0x2fc
      [  106.166530]                      force_page_cache_readahead+0x121/0x133
      [  106.167439]                      page_cache_sync_readahead+0x35f/0x3bb
      [  106.168337]                      generic_file_buffered_read+0x410/0x1860
      [  106.169255]                      __vfs_read+0x319/0x38f
      [  106.169977]                      vfs_read+0xd2/0x19a
      [  106.170662]                      ksys_read+0xb9/0x135
      [  106.171356]                      do_syscall_64+0x140/0x385
      [  106.172120]                      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.173051]   }
      [  106.173308]   ... key      at: [<ffffffff85094600>] __key.26481+0x0/0x40
      [  106.174219]   ... acquired at:
      [  106.174646]    _raw_spin_lock+0x33/0x64
      [  106.175183]    blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.175843]    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.176518]    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.177262]    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.177900]    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.178591]    blk_mq_run_hw_queue+0x151/0x187
      [  106.179207]    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.179926]    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.180571]    blk_flush_plug_list+0x392/0x3d7
      [  106.181187]    blk_finish_plug+0x37/0x4f
      [  106.181737]    __se_sys_io_submit+0x171/0x304
      [  106.182346]    do_syscall_64+0x140/0x385
      [  106.182895]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.183607]
      [  106.183830] -> (&(&hctx->dispatch_wait_lock)->rlock){....} ops: 1 {
      [  106.184691]    INITIAL USE at:
      [  106.185119]                    _raw_spin_lock+0x33/0x64
      [  106.185838]                    blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.186697]                    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.187551]                    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.188481]                    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.189307]                    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.190189]                    blk_mq_run_hw_queue+0x151/0x187
      [  106.190989]                    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.191902]                    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.192739]                    blk_flush_plug_list+0x392/0x3d7
      [  106.193535]                    blk_finish_plug+0x37/0x4f
      [  106.194269]                    __se_sys_io_submit+0x171/0x304
      [  106.195059]                    do_syscall_64+0x140/0x385
      [  106.195794]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.196705]  }
      [  106.196950]  ... key      at: [<ffffffff84880620>] __key.51231+0x0/0x40
      [  106.197853]  ... acquired at:
      [  106.198270]    lock_acquire+0x280/0x2f3
      [  106.198806]    _raw_spin_lock+0x33/0x64
      [  106.199337]    sbitmap_get+0xd5/0x22c
      [  106.199850]    __sbitmap_queue_get+0xe8/0x177
      [  106.200450]    __blk_mq_get_tag+0x1e6/0x22d
      [  106.201035]    blk_mq_get_tag+0x1db/0x6e4
      [  106.201589]    blk_mq_get_driver_tag+0x161/0x258
      [  106.202237]    blk_mq_dispatch_rq_list+0x5b9/0xd7c
      [  106.202902]    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.203572]    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.204316]    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.204956]    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.205649]    blk_mq_run_hw_queue+0x151/0x187
      [  106.206269]    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.206997]    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.207644]    blk_flush_plug_list+0x392/0x3d7
      [  106.208264]    blk_finish_plug+0x37/0x4f
      [  106.208814]    __se_sys_io_submit+0x171/0x304
      [  106.209415]    do_syscall_64+0x140/0x385
      [  106.209965]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.210684]
      [  106.210904]
      [  106.210904] the dependencies between the lock to be acquired
      [  106.210905]  and SOFTIRQ-irq-unsafe lock:
      [  106.212541] -> (&(&sb->map[i].swap_lock)->rlock){+.+.} ops: 1969 {
      [  106.213393]    HARDIRQ-ON-W at:
      [  106.213840]                     _raw_spin_lock+0x33/0x64
      [  106.214570]                     sbitmap_get+0xd5/0x22c
      [  106.215282]                     __sbitmap_queue_get+0xe8/0x177
      [  106.216086]                     __blk_mq_get_tag+0x1e6/0x22d
      [  106.216876]                     blk_mq_get_tag+0x1db/0x6e4
      [  106.217627]                     blk_mq_get_driver_tag+0x161/0x258
      [  106.218465]                     blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.219326]                     blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.220198]                     blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.221138]                     __blk_mq_run_hw_queue+0x137/0x17e
      [  106.221975]                     __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.222874]                     blk_mq_run_hw_queue+0x151/0x187
      [  106.223686]                     blk_mq_sched_insert_requests+0x13f/0x175
      [  106.224597]                     blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.225444]                     blk_flush_plug_list+0x392/0x3d7
      [  106.226255]                     blk_finish_plug+0x37/0x4f
      [  106.227006]                     read_pages+0x3ef/0x430
      [  106.227717]                     __do_page_cache_readahead+0x18e/0x2fc
      [  106.228595]                     force_page_cache_readahead+0x121/0x133
      [  106.229491]                     page_cache_sync_readahead+0x35f/0x3bb
      [  106.230373]                     generic_file_buffered_read+0x410/0x1860
      [  106.231277]                     __vfs_read+0x319/0x38f
      [  106.231986]                     vfs_read+0xd2/0x19a
      [  106.232666]                     ksys_read+0xb9/0x135
      [  106.233350]                     do_syscall_64+0x140/0x385
      [  106.234097]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.235012]    SOFTIRQ-ON-W at:
      [  106.235460]                     _raw_spin_lock+0x33/0x64
      [  106.236195]                     sbitmap_get+0xd5/0x22c
      [  106.236913]                     __sbitmap_queue_get+0xe8/0x177
      [  106.237715]                     __blk_mq_get_tag+0x1e6/0x22d
      [  106.238488]                     blk_mq_get_tag+0x1db/0x6e4
      [  106.239244]                     blk_mq_get_driver_tag+0x161/0x258
      [  106.240079]                     blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.240937]                     blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.241806]                     blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.242751]                     __blk_mq_run_hw_queue+0x137/0x17e
      [  106.243579]                     __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.244469]                     blk_mq_run_hw_queue+0x151/0x187
      [  106.245277]                     blk_mq_sched_insert_requests+0x13f/0x175
      [  106.246191]                     blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.247044]                     blk_flush_plug_list+0x392/0x3d7
      [  106.247859]                     blk_finish_plug+0x37/0x4f
      [  106.248749]                     read_pages+0x3ef/0x430
      [  106.249463]                     __do_page_cache_readahead+0x18e/0x2fc
      [  106.250357]                     force_page_cache_readahead+0x121/0x133
      [  106.251263]                     page_cache_sync_readahead+0x35f/0x3bb
      [  106.252157]                     generic_file_buffered_read+0x410/0x1860
      [  106.253084]                     __vfs_read+0x319/0x38f
      [  106.253808]                     vfs_read+0xd2/0x19a
      [  106.254488]                     ksys_read+0xb9/0x135
      [  106.255186]                     do_syscall_64+0x140/0x385
      [  106.255943]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.256867]    INITIAL USE at:
      [  106.257300]                    _raw_spin_lock+0x33/0x64
      [  106.258033]                    sbitmap_get+0xd5/0x22c
      [  106.258747]                    __sbitmap_queue_get+0xe8/0x177
      [  106.259542]                    __blk_mq_get_tag+0x1e6/0x22d
      [  106.260320]                    blk_mq_get_tag+0x1db/0x6e4
      [  106.261072]                    blk_mq_get_driver_tag+0x161/0x258
      [  106.261902]                    blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.262762]                    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.263626]                    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.264571]                    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.265409]                    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.266302]                    blk_mq_run_hw_queue+0x151/0x187
      [  106.267111]                    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.268028]                    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.268878]                    blk_flush_plug_list+0x392/0x3d7
      [  106.269694]                    blk_finish_plug+0x37/0x4f
      [  106.270432]                    read_pages+0x3ef/0x430
      [  106.271139]                    __do_page_cache_readahead+0x18e/0x2fc
      [  106.272040]                    force_page_cache_readahead+0x121/0x133
      [  106.272932]                    page_cache_sync_readahead+0x35f/0x3bb
      [  106.273811]                    generic_file_buffered_read+0x410/0x1860
      [  106.274709]                    __vfs_read+0x319/0x38f
      [  106.275407]                    vfs_read+0xd2/0x19a
      [  106.276074]                    ksys_read+0xb9/0x135
      [  106.276764]                    do_syscall_64+0x140/0x385
      [  106.277500]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.278417]  }
      [  106.278676]  ... key      at: [<ffffffff85094640>] __key.26212+0x0/0x40
      [  106.279586]  ... acquired at:
      [  106.280026]    lock_acquire+0x280/0x2f3
      [  106.280559]    _raw_spin_lock+0x33/0x64
      [  106.281101]    sbitmap_get+0xd5/0x22c
      [  106.281610]    __sbitmap_queue_get+0xe8/0x177
      [  106.282221]    __blk_mq_get_tag+0x1e6/0x22d
      [  106.282809]    blk_mq_get_tag+0x1db/0x6e4
      [  106.283368]    blk_mq_get_driver_tag+0x161/0x258
      [  106.284018]    blk_mq_dispatch_rq_list+0x5b9/0xd7c
      [  106.284685]    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.285371]    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.286135]    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.286806]    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.287515]    blk_mq_run_hw_queue+0x151/0x187
      [  106.288149]    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.289041]    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.289912]    blk_flush_plug_list+0x392/0x3d7
      [  106.290590]    blk_finish_plug+0x37/0x4f
      [  106.291238]    __se_sys_io_submit+0x171/0x304
      [  106.291864]    do_syscall_64+0x140/0x385
      [  106.292534]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reported-by: NMing Lei <ming.lei@redhat.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      58ab5e32
  23. 01 12月, 2018 2 次提交
    • J
      sbitmap: optimize wakeup check · 5d2ee712
      Jens Axboe 提交于
      Even if we have no waiters on any of the sbitmap_queue wait states, we
      still have to loop every entry to check. We do this for every IO, so
      the cost adds up.
      
      Shift a bit of the cost to the slow path, when we actually have waiters.
      Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain
      an internal count of how many are currently active. Then we can simply
      check this count in sbq_wake_ptr() and not have to loop if we don't
      have any sleepers.
      
      Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5d2ee712
    • J
      sbitmap: ammortize cost of clearing bits · ea86ea2c
      Jens Axboe 提交于
      sbitmap maintains a set of words that we use to set and clear bits, with
      each bit representing a tag for blk-mq. Even though we spread the bits
      out and maintain a hint cache, one particular bit allocated will end up
      being cleared in the exact same spot.
      
      This introduces batched clearing of bits. Instead of clearing a given
      bit, the same bit is set in a cleared/free mask instead. If we fail
      allocating a bit from a given word, then we check the free mask, and
      batch move those cleared bits at that time. This trades 64 atomic bitops
      for 2 cmpxchg().
      
      In a threaded poll test case, half the overhead of getting and clearing
      tags is removed with this change. On another poll test case with a
      single thread, performance is unchanged.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ea86ea2c
  24. 30 11月, 2018 1 次提交
  25. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc_node() -> kcalloc_node() · 590b5b7d
      Kees Cook 提交于
      The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This
      patch replaces cases of:
      
              kzalloc_node(a * b, gfp, node)
      
      with:
              kcalloc_node(a * b, gfp, node)
      
      as well as handling cases of:
      
              kzalloc_node(a * b * c, gfp, node)
      
      with:
      
              kzalloc_node(array3_size(a, b, c), gfp, node)
      
      as it's slightly less ugly than:
      
              kcalloc_node(array_size(a, b), c, gfp, node)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc_node(4 * 1024, gfp, node)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc_node(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc_node(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc_node(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc_node(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc_node
      + kcalloc_node
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc_node(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc_node(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc_node(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc_node(C1 * C2 * C3, ...)
      |
        kzalloc_node(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc_node(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc_node(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc_node(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc_node(sizeof(THING) * C2, ...)
      |
        kzalloc_node(sizeof(TYPE) * C2, ...)
      |
        kzalloc_node(C1 * C2 * C3, ...)
      |
        kzalloc_node(C1 * C2, ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc_node
      + kcalloc_node
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      590b5b7d
  26. 25 5月, 2018 1 次提交
    • M
      blk-mq: avoid starving tag allocation after allocating process migrates · e6fc4649
      Ming Lei 提交于
      When the allocation process is scheduled back and the mapped hw queue is
      changed, fake one extra wake up on previous queue for compensating wake
      up miss, so other allocations on the previous queue won't be starved.
      
      This patch fixes one request allocation hang issue, which can be
      triggered easily in case of very low nr_request.
      
      The race is as follows:
      
      1) 2 hw queues, nr_requests are 2, and wake_batch is one
      
      2) there are 3 waiters on hw queue 0
      
      3) two in-flight requests in hw queue 0 are completed, and only two
         waiters of 3 are waken up because of wake_batch, but both the two
         waiters can be scheduled to another CPU and cause to switch to hw
         queue 1
      
      4) then the 3rd waiter will wait for ever, since no in-flight request
         is in hw queue 0 any more.
      
      5) this patch fixes it by the fake wakeup when waiter is scheduled to
         another hw queue
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Modified commit message to make it clearer, and make it apply on
      top of the 4.18 branch.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6fc4649
  27. 15 5月, 2018 1 次提交
    • J
      sbitmap: fix race in wait batch accounting · c854ab57
      Jens Axboe 提交于
      If we have multiple callers of sbq_wake_up(), we can end up in a
      situation where the wait_cnt will continually go more and more
      negative. Consider the case where our wake batch is 1, hence
      wait_cnt will start out as 1.
      
      wait_cnt == 1
      
      CPU0				CPU1
      atomic_dec_return(), cnt == 0
      				atomic_dec_return(), cnt == -1
      				cmpxchg(-1, 0) (succeeds)
      				[wait_cnt now 0]
      cmpxchg(0, 1) (fails)
      
      This ends up with wait_cnt being 0, we'll wakeup immediately
      next time. Going through the same loop as above again, and
      we'll have wait_cnt -1.
      
      For the case where we have a larger wake batch, the only
      difference is that the starting point will be higher. We'll
      still end up with continually smaller batch wakeups, which
      defeats the purpose of the rolling wakeups.
      
      Always reset the wait_cnt to the batch value. Then it doesn't
      matter who wins the race. But ensure that whomever does win
      the race is the one that increments the ws index and wakes up
      our batch count, loser gets to call __sbq_wake_up() again to
      account his wakeups towards the next active wait state index.
      
      Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c854ab57
  28. 11 5月, 2018 2 次提交
  29. 01 3月, 2018 1 次提交
    • O
      sbitmap: use test_and_set_bit_lock()/clear_bit_unlock() · 4ace53f1
      Omar Sandoval 提交于
      sbitmap_queue_get()/sbitmap_queue_clear() are used for
      allocating/freeing a resource, so they should provide acquire/release
      barrier semantics, respectively. sbitmap_get() currently contains a full
      barrier, which is unnecessary, so use test_and_set_bit_lock() instead of
      test_and_set_bit() (these are equivalent on x86_64). sbitmap_clear_bit()
      does not imply any barriers, which is incorrect, as accesses of the
      resource (e.g., request) could potentially get reordered to after the
      clear_bit(). Introduce sbitmap_clear_bit_unlock() and use it for
      sbitmap_queue_clear() (this only adds a compiler barrier on x86_64). The
      other existing user of sbitmap_clear_bit() (the blk-mq software queue
      pending map) is serialized through a spinlock and does not need this.
      Reported-by: NTejun Heo <tj@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ace53f1