1. 25 6月, 2014 2 次提交
  2. 23 6月, 2014 3 次提交
    • J
      Revert "block: add __init to elv_register" · e567bf71
      Jens Axboe 提交于
      This reverts commit b5097e95.
      
      The original commit is buggy, we do use the registration functions
      at runtime, for instance when loading IO schedulers through sysfs.
      Reported-by: NDamien Wyart <damien.wyart@gmail.com>
      e567bf71
    • J
      Revert "block: add __init to blkcg_policy_register" · d5bf0291
      Jens Axboe 提交于
      This reverts commit a2d445d4.
      
      The original commit is buggy, we do use the registration functions
      at runtime for modular builds.
      d5bf0291
    • T
      blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t · a5049a8a
      Tejun Heo 提交于
      Hello,
      
      So, this patch should do.  Joe, Vivek, can one of you guys please
      verify that the oops goes away with this patch?
      
      Jens, the original thread can be read at
      
        http://thread.gmane.org/gmane.linux.kernel/1720729
      
      The fix converts blkg->refcnt from int to atomic_t.  It does some
      overhead but it should be minute compared to everything else which is
      going on and the involved cacheline bouncing, so I think it's highly
      unlikely to cause any noticeable difference.  Also, the refcnt in
      question should be converted to a perpcu_ref for blk-mq anyway, so the
      atomic_t is likely to go away pretty soon anyway.
      
      Thanks.
      
      ------- 8< -------
      __blkg_release_rcu() may be invoked after the associated request_queue
      is released with a RCU grace period inbetween.  As such, the function
      and callbacks invoked from it must not dereference the associated
      request_queue.  This is clearly indicated in the comment above the
      function.
      
      Unfortunately, while trying to fix a different issue, 2a4fd070
      ("blkcg: move bulk of blkcg_gq release operations to the RCU
      callback") ignored this and added [un]locking of @blkg->q->queue_lock
      to __blkg_release_rcu().  This of course can cause oops as the
      request_queue may be long gone by the time this code gets executed.
      
        general protection fault: 0000 [#1] SMP
        CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1
        Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 12/25/2013
        task: ffff880854021de0 ti: ffff88085403c000 task.ti: ffff88085403c000
        RIP: 0010:[<ffffffff8162e9e5>]  [<ffffffff8162e9e5>] _raw_spin_lock_irq+0x15/0x60
        RSP: 0018:ffff88085403fdf0  EFLAGS: 00010086
        RAX: 0000000000020000 RBX: 0000000000000010 RCX: 0000000000000000
        RDX: 000060ef80008248 RSI: 0000000000000286 RDI: 6b6b6b6b6b6b6b6b
        RBP: ffff88085403fdf0 R08: 0000000000000286 R09: 0000000000009f39
        R10: 0000000000020001 R11: 0000000000020001 R12: ffff88103c17a130
        R13: ffff88103c17a080 R14: 0000000000000000 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff88107fca0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000006e5ab8 CR3: 000000000193d000 CR4: 00000000000407e0
        Stack:
         ffff88085403fe18 ffffffff812cbfc2 ffff88103c17a130 0000000000000000
         ffff88103c17a130 ffff88085403fec0 ffffffff810d1d28 ffff880854021de0
         ffff880854021de0 ffff88107fcaec58 ffff88085403fe80 ffff88107fcaec30
        Call Trace:
         [<ffffffff812cbfc2>] __blkg_release_rcu+0x72/0x150
         [<ffffffff810d1d28>] rcu_nocb_kthread+0x1e8/0x300
         [<ffffffff81091d81>] kthread+0xe1/0x100
         [<ffffffff8163813c>] ret_from_fork+0x7c/0xb0
        Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 66 66 66 90 55 48 89 e5
        +fa 66 66 90 66 66 90 b8 00 00 02 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f
        +b7
        RIP  [<ffffffff8162e9e5>] _raw_spin_lock_irq+0x15/0x60
         RSP <ffff88085403fdf0>
      
      The request_queue locking was added because blkcg_gq->refcnt is an int
      protected with the queue lock and __blkg_release_rcu() needs to put
      the parent.  Let's fix it by making blkcg_gq->refcnt an atomic_t and
      dropping queue locking in the function.
      
      Given the general heavy weight of the current request_queue and blkcg
      operations, this is unlikely to cause any noticeable overhead.
      Moreover, blkcg_gq->refcnt is likely to be converted to percpu_ref in
      the near future, so whatever (most likely negligible) overhead it may
      add is temporary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Link: http://lkml.kernel.org/g/alpine.DEB.2.02.1406081816540.17948@jlaw-desktop.mno.stratus.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a5049a8a
  3. 18 6月, 2014 3 次提交
    • A
      blk-mq: bitmap tag: fix races in bt_get() function · 86fb5c56
      Alexander Gordeev 提交于
      This update fixes few issues in bt_get() function:
      
      - list_empty(&wait.task_list) check is not protected;
      
      - was_empty check is always true which results in *every* thread
        entering the loop resets bt_wait_state::wait_cnt counter rather
        than every bt->wake_cnt'th thread;
      
      - 'bt_wait_state::wait_cnt' counter update is redundant, since
        it also gets reset in bt_clear_tag() function;
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      86fb5c56
    • A
      blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt · 2971c35f
      Alexander Gordeev 提交于
      This piece of code in bt_clear_tag() function is racy:
      
      	bs = bt_wake_ptr(bt);
      	if (bs && atomic_dec_and_test(&bs->wait_cnt)) {
      		atomic_set(&bs->wait_cnt, bt->wake_cnt);
       		wake_up(&bs->wait);
      	}
      
      Since nothing prevents bt_wake_ptr() from returning the very
      same 'bs' address on multiple CPUs, the following scenario is
      possible:
      
          CPU1                                CPU2
          ----                                ----
      
      0.  bs = bt_wake_ptr(bt);               bs = bt_wake_ptr(bt);
      1.  atomic_dec_and_test(&bs->wait_cnt)
      2.                                      atomic_dec_and_test(&bs->wait_cnt)
      3.  atomic_set(&bs->wait_cnt, bt->wake_cnt);
      
      If the decrement in [1] yields zero then for some amount of time
      the decrement in [2] results in a negative/overflow value, which
      is not expected. The follow-up assignment in [3] overwrites the
      invalid value with the batch value (and likely prevents the issue
      from being severe) which is still incorrect and should be a lesser.
      
      Cc: Ming Lei <tom.leiming@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2971c35f
    • A
      blk-mq: bitmap tag: fix races on shared ::wake_index fields · 8537b120
      Alexander Gordeev 提交于
      Fix racy updates of shared blk_mq_bitmap_tags::wake_index
      and blk_mq_hw_ctx::wake_index fields.
      
      Cc: Ming Lei <tom.leiming@gmail.com>
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8537b120
  4. 14 6月, 2014 2 次提交
  5. 12 6月, 2014 2 次提交
  6. 11 6月, 2014 3 次提交
  7. 10 6月, 2014 1 次提交
  8. 09 6月, 2014 2 次提交
  9. 07 6月, 2014 3 次提交
  10. 06 6月, 2014 3 次提交
    • J
      blk-mq: bump max tag depth to 10K tags · a4391c64
      Jens Axboe 提交于
      For some scsi-mq cases, the tag map can be huge. So increase the
      max number of tags we support.
      
      Additionally, don't fail with EINVAL if a user requests too many
      tags. Warn that the tag depth has been adjusted down, and store
      the new value inside the tag_set passed in.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a4391c64
    • J
      block: add blk_rq_set_block_pc() · f27b087b
      Jens Axboe 提交于
      With the optimizations around not clearing the full request at alloc
      time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
      up to the user allocating the request.
      
      Add a blk_rq_set_block_pc() that sets the command type to
      REQ_TYPE_BLOCK_PC, and properly initializes the members associated
      with this type of request. Update callers to use this function instead
      of manipulating rq->cmd_type directly.
      
      Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
      attempt.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f27b087b
    • J
      block: add notion of a chunk size for request merging · 762380ad
      Jens Axboe 提交于
      Some drivers have different limits on what size a request should
      optimally be, depending on the offset of the request. Similar to
      dividing a device into chunks. Add a setting that allows the driver
      to inform the block layer of such a chunk size. The block layer will
      then prevent merging across the chunks.
      
      This is needed to optimally support NVMe with a non-zero stripe size.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      762380ad
  11. 05 6月, 2014 2 次提交
  12. 04 6月, 2014 5 次提交
  13. 31 5月, 2014 4 次提交
  14. 30 5月, 2014 4 次提交
  15. 29 5月, 2014 1 次提交
    • J
      block: add queue flag for disabling SG merging · 05f1dd53
      Jens Axboe 提交于
      If devices are not SG starved, we waste a lot of time potentially
      collapsing SG segments. Enough that 1.5% of the CPU time goes
      to this, at only 400K IOPS. Add a queue flag, QUEUE_FLAG_NO_SG_MERGE,
      which just returns the number of vectors in a bio instead of looping
      over all segments and checking for collapsible ones.
      
      Add a BLK_MQ_F_SG_MERGE flag so that drivers can opt-in on the sg
      merging, if they so desire.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      05f1dd53