1. 16 4月, 2014 6 次提交
    • C
      blk-mq: initialize request on allocation · ed44832d
      Christoph Hellwig 提交于
      If we want to share tag and request allocation between queues we cannot
      initialize the request at init/free time, but need to initialize it
      at allocation time as it might get used for different queues over its
      lifetime.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ed44832d
    • C
      blk-mq: add ->init_request and ->exit_request methods · e9b267d9
      Christoph Hellwig 提交于
      The current blk_mq_init_commands/blk_mq_free_commands interface has a
      two problems:
      
       1) Because only the constructor is passed to blk_mq_init_commands there
          is no easy way to clean up when a comman initialization failed.  The
          current code simply leaks the allocations done in the constructor.
      
       2) There is no good place to call blk_mq_free_commands: before
          blk_cleanup_queue there is no guarantee that all outstanding
          commands have completed, so we can't free them yet.  After
          blk_cleanup_queue the queue has usually been freed.  This can be
          worked around by grabbing an unconditional reference before calling
          blk_cleanup_queue and dropping it after blk_mq_free_commands is
          done, although that's not exatly pretty and driver writers are
          guaranteed to get it wrong sooner or later.
      
      Both issues are easily fixed by making the request constructor and
      destructor normal blk_mq_ops methods.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e9b267d9
    • C
      blk-mq: make ->flush_rq fully transparent to drivers · 8727af4b
      Christoph Hellwig 提交于
      Drivers shouldn't have to care about the block layer setting aside a
      request to implement the flush state machine.  We already override the
      mq context and tag to make it more transparent, but so far haven't deal
      with the driver private data in the request.  Make sure to override this
      as well, and while we're at it add a proper helper sitting in blk-mq.c
      that implements the full impersonation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8727af4b
    • C
      blk-mq: do not initialize req->special · 9d74e257
      Christoph Hellwig 提交于
      Drivers can reach their private data easily using the blk_mq_rq_to_pdu
      helper and don't need req->special.  By not initializing it code can
      be simplified nicely, and we also shave off a few more instructions from
      the I/O path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9d74e257
    • C
      blk-mq: initialize resid_len · 742ee69b
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      742ee69b
    • J
      block: remove struct request buffer member · b4f42e28
      Jens Axboe 提交于
      This was used in the olden days, back when onions were proper
      yellow. Basically it mapped to the current buffer to be
      transferred. With highmem being added more than a decade ago,
      most drivers map pages out of a bio, and rq->buffer isn't
      pointing at anything valid.
      
      Convert old style drivers to just use bio_data().
      
      For the discard payload use case, just reference the page
      in the bio.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b4f42e28
  2. 11 4月, 2014 1 次提交
  3. 10 4月, 2014 5 次提交
    • J
      block: fix regression with block enabled tagging · 360f92c2
      Jens Axboe 提交于
      Martin reported that his test system would not boot with
      current git, it oopsed with this:
      
      BUG: unable to handle kernel paging request at ffff88046c6c9e80
      IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
      PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
      Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
      rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
      CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
      Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
      Workqueue: events_unbound async_run_entry_fn
      task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
      RIP: 0010:[<ffffffff812971e0>]  [<ffffffff812971e0>]
      blk_queue_start_tag+0x90/0x150
      RSP: 0018:ffff880273d03a58  EFLAGS: 00010092
      RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
      RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
      RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
      R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
      R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
      FS:  0000000000000000(0000) GS:ffff880277b00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
      Stack:
       ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
       ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
       ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
      Call Trace:
       [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510
       [<ffffffff81293167>] __blk_run_queue+0x37/0x50
       [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130
       [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0
       [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0
       [<ffffffff813bba35>] scsi_execute+0xe5/0x180
       [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110
       [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod]
       [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0
       [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
       [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod]
       [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140
       [<ffffffff8106a1c5>] process_one_work+0x175/0x410
       [<ffffffff8106b373>] worker_thread+0x123/0x400
       [<ffffffff8106b250>] ? manage_workers+0x160/0x160
       [<ffffffff8107104e>] kthread+0xce/0xf0
       [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0
       [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
      Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
      df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89
      58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
      RIP  [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
       RSP <ffff880273d03a58>
      CR2: ffff88046c6c9e80
      
      Martin bisected and found this to be the problem patch;
      
      	commit 6d113398
      	Author: Jan Kara <jack@suse.cz>
      	Date:   Mon Feb 24 16:39:54 2014 +0100
      
      	    block: Stop abusing rq->csd.list in blk-softirq
      
      and the problem was immediately apparent. The patch states that
      it is safe to reuse queuelist at completion time, since it is
      no longer used. However, that is not true if a device is using
      block enabled tagging. If that is the case, then the queuelist
      is reused to keep track of busy tags. If a device also ended
      up using softirq completions, we'd reuse ->queuelist for the
      IPI handling while block tagging was still using it. Boom.
      
      Fix this by adding a new ipi_list list head, and share the
      memory used with the request hash table. The hash table is
      never used after the request is moved to the dispatch list,
      which happens long before any potential completion of the
      request. Add a new request bit for this, so we don't have
      cases that check rq->hash while it could potentially have
      been reused for the IPI completion.
      Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      360f92c2
    • J
      blk-mq: simplify blk_mq_hw_sysfs_cpus_show() · cb2da43e
      Jens Axboe 提交于
      Now that we have a cpu mask of CPUs that are mapped to
      a specific hardware queue, we can just iterate that to
      display the sysfs num-hw-queue/cpu_list file.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cb2da43e
    • J
      blk-mq: ensure that hardware queues are always run on the mapped CPUs · e4043dcf
      Jens Axboe 提交于
      Instead of providing soft mappings with no guarantees on hardware
      queues always being run on the right CPU, switch to a hard mapping
      guarantee that ensure that we always run the hardware queue on
      (one of, if more) the mapped CPU.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e4043dcf
    • J
      block: add kblockd_schedule_delayed_work_on() · 8ab14595
      Jens Axboe 提交于
      Same function as kblockd_schedule_delayed_work(), but allow the
      caller to pass in a CPU that the work should be executed on. This
      just directly extends and maps into the workqueue API, and will
      be used to make the blk-mq mappings more strict.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ab14595
    • J
      block: remove 'q' parameter from kblockd_schedule_*_work() · 59c3d45e
      Jens Axboe 提交于
      The queue parameter is never used, just get rid of it.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      59c3d45e
  4. 07 4月, 2014 1 次提交
    • J
      blk-mq: fix potential stall during CPU unplug with IO pending · bccb5f7c
      Jens Axboe 提交于
      When a CPU is unplugged, we move the blk_mq_ctx request entries
      to the current queue. The current code forgets to remap the
      blk_mq_hw_ctx before marking the software context pending,
      which breaks if old-cpu and new-cpu don't map to the same
      hardware queue.
      
      Additionally, if we mark entries as pending in the new
      hardware queue, then make sure we schedule it for running.
      Otherwise request could be sitting there until someone else
      queues IO for that hardware queue.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bccb5f7c
  5. 02 4月, 2014 1 次提交
  6. 21 3月, 2014 7 次提交
    • S
      blk-mq: add REQ_SYNC early · 27fbf4e8
      Shaohua Li 提交于
      Add REQ_SYNC early, so rq_dispatched[] in blk_mq_rq_ctx_init
      is set correctly.
      
      Signed-off-by: Shaohua Li<shli@fusionio.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      27fbf4e8
    • M
      rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock · 55c816e3
      Mike Galbraith 提交于
      [  365.164040] BUG: sleeping function called from invalid context at kernel/rtmutex.c:674
      [  365.164041] in_atomic(): 1, irqs_disabled(): 1, pid: 26, name: migration/1
      [  365.164043] no locks held by migration/1/26.
      [  365.164044] irq event stamp: 6648
      [  365.164056] hardirqs last  enabled at (6647): [<ffffffff8153d377>] restore_args+0x0/0x30
      [  365.164062] hardirqs last disabled at (6648): [<ffffffff810ed98d>] multi_cpu_stop+0x9d/0x120
      [  365.164070] softirqs last  enabled at (0): [<ffffffff810543bc>] copy_process.part.28+0x6fc/0x1920
      [  365.164072] softirqs last disabled at (0): [<          (null)>]           (null)
      [  365.164076] CPU: 1 PID: 26 Comm: migration/1 Tainted: GF           N  3.12.12-rt19-0.gcb6c4a2-rt #3
      [  365.164078] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
      [  365.164091]  0000000000000001 ffff880a42ea7c30 ffffffff815367e6 ffffffff81a086c0
      [  365.164099]  ffff880a42ea7c40 ffffffff8108919c ffff880a42ea7c60 ffffffff8153c24f
      [  365.164107]  ffff880a42ea91f0 00000000ffffffe1 ffff880a42ea7c88 ffffffff81297ec0
      [  365.164108] Call Trace:
      [  365.164119]  [<ffffffff810060b1>] try_stack_unwind+0x191/0x1a0
      [  365.164127]  [<ffffffff81004872>] dump_trace+0x92/0x360
      [  365.164133]  [<ffffffff81006108>] show_trace_log_lvl+0x48/0x60
      [  365.164138]  [<ffffffff81004c18>] show_stack_log_lvl+0xd8/0x1d0
      [  365.164143]  [<ffffffff81006160>] show_stack+0x20/0x50
      [  365.164153]  [<ffffffff815367e6>] dump_stack+0x54/0x9a
      [  365.164163]  [<ffffffff8108919c>] __might_sleep+0xfc/0x140
      [  365.164173]  [<ffffffff8153c24f>] rt_spin_lock+0x1f/0x70
      [  365.164182]  [<ffffffff81297ec0>] blk_mq_main_cpu_notify+0x20/0x70
      [  365.164191]  [<ffffffff81540a1c>] notifier_call_chain+0x4c/0x70
      [  365.164201]  [<ffffffff81083499>] __raw_notifier_call_chain+0x9/0x10
      [  365.164207]  [<ffffffff810567be>] cpu_notify+0x1e/0x40
      [  365.164217]  [<ffffffff81525da2>] take_cpu_down+0x22/0x40
      [  365.164223]  [<ffffffff810ed9c6>] multi_cpu_stop+0xd6/0x120
      [  365.164229]  [<ffffffff810edd97>] cpu_stopper_thread+0xd7/0x1e0
      [  365.164235]  [<ffffffff810863a3>] smpboot_thread_fn+0x203/0x380
      [  365.164241]  [<ffffffff8107cbf8>] kthread+0xc8/0xd0
      [  365.164250]  [<ffffffff8154440c>] ret_from_fork+0x7c/0xb0
      [  365.164429] smpboot: CPU 1 is now offline
      Signed-off-by: NMike Galbraith <bitbucket@online.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      55c816e3
    • C
      blk-mq: support partial I/O completions · 7237c740
      Christoph Hellwig 提交于
      Add a new blk_mq_end_io_partial function to partially complete requests
      as needed by the SCSI layer.  We do this by reusing blk_update_request
      to advance the bio instead of having a simplified version of it in
      the blk-mq code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7237c740
    • C
      blk-mq: merge blk_mq_insert_request and blk_mq_run_request · eeabc850
      Christoph Hellwig 提交于
      It's almost identical to blk_mq_insert_request, so fold the two into one
      slightly more generic function by making the flush special case a bit
      smarted.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      eeabc850
    • C
      blk-mq: remove blk_mq_alloc_rq · 081241e5
      Christoph Hellwig 提交于
      There's only one caller, which is a straight wrapper and fits the naming
      scheme of the related functions a lot better.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      081241e5
    • D
      block: free q->flush_rq in blk_init_allocated_queue error paths · 708f04d2
      Dave Jones 提交于
      Commit 7982e90c ("block: fix q->flush_rq NULL pointer crash on
      dm-mpath flush") moved an allocation to blk_init_allocated_queue(), but
      neglected to free that allocation on the error paths that follow.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      708f04d2
    • J
      blk-mq: don't dump CPU -> hw queue map on driver load · 676141e4
      Jens Axboe 提交于
      Now that we are out of initial debug/bringup mode, remove
      the verbose dump of the mapping table.
      
      Provide the mapping table in sysfs, under the hardware queue
      directory, in the cpu_list file.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      676141e4
  7. 20 3月, 2014 1 次提交
    • J
      blk-mq: fix wrong usage of hctx->state vs hctx->flags · 5d12f905
      Jens Axboe 提交于
      BLK_MQ_F_* flags are for hctx->flags, and are non-atomic and
      set at registration time. BLK_MQ_S_* flags are dynamic and
      atomic, and are accessed through hctx->state.
      
      Some of the BLK_MQ_S_STOPPED uses were wrong. Additionally,
      the header file should not use a bit shift for the _S_ flags,
      as they are done through the set/test_bit functions.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5d12f905
  8. 19 3月, 2014 1 次提交
    • T
      cgroup: drop const from @buffer of cftype->write_string() · 4d3bb511
      Tejun Heo 提交于
      cftype->write_string() just passes on the writeable buffer from kernfs
      and there's no reason to add const restriction on the buffer.  The
      only thing const achieves is unnecessarily complicating parsing of the
      buffer.  Drop const from @buffer.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>                                           
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      4d3bb511
  9. 15 3月, 2014 2 次提交
  10. 13 3月, 2014 1 次提交
    • J
      block: remove old blk_iopoll_enabled variable · 89f8b33c
      Jens Axboe 提交于
      This was a debugging measure to toggle enabled/disabled
      when testing. But for real production setups, it's not
      safe to toggle this setting without either reloading
      drivers of quiescing IO first. Neither of which the toggle
      enforces.
      
      Additionally, it makes drivers deal with the conditional
      state.
      
      Remove it completely. It's up to the driver whether iopoll
      is enabled or not.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      89f8b33c
  11. 09 3月, 2014 2 次提交
    • M
      block: change flush sequence list addition back to front add · 10beafc1
      Mike Snitzer 提交于
      Commit 18741986 inadvertently changed the rq flush insertion
      from a head to a tail insertion. Fix that back up.
      Signed-off-by: NMike Snitzer <msnitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      10beafc1
    • M
      block: fix q->flush_rq NULL pointer crash on dm-mpath flush · 7982e90c
      Mike Snitzer 提交于
      Commit 18741986 ("blk-mq: rework flush sequencing logic") switched
      ->flush_rq from being an embedded member of the request_queue structure
      to being dynamically allocated in blk_init_queue_node().
      
      Request-based DM multipath doesn't use blk_init_queue_node(), instead it
      uses blk_alloc_queue_node() + blk_init_allocated_queue().  Because
      commit 18741986 placed the dynamic allocation of ->flush_rq in
      blk_init_queue_node() any flush issued to a dm-mpath device would crash
      with a NULL pointer, e.g.:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0
      PGD bb3c7067 PUD bb01d067 PMD 0
      Oops: 0002 [#1] SMP
      ...
      CPU: 5 PID: 5028 Comm: dt Tainted: G        W  O 3.14.0-rc3.snitm+ #10
      ...
      task: ffff88032fb270e0 ti: ffff880079564000 task.ti: ffff880079564000
      RIP: 0010:[<ffffffff8125037e>]  [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0
      RSP: 0018:ffff880079565c98  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030
      RDX: ffff880260c74048 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff880079565ca8 R08: ffff880260aa1e98 R09: 0000000000000001
      R10: ffff88032fa78500 R11: 0000000000000246 R12: 0000000000000000
      R13: ffff880260aa1de8 R14: 0000000000000650 R15: 0000000000000000
      FS:  00007f8d36a2a700(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000079b36000 CR4: 00000000000007e0
      Stack:
       0000000000000000 ffff880260c74048 ffff880079565cd8 ffffffff81257a47
       ffff880260aa1de8 ffff880260c74048 0000000000000001 0000000000000000
       ffff880079565d08 ffffffff81257c2d 0000000000000000 ffff880260aa1de8
      Call Trace:
       [<ffffffff81257a47>] blk_flush_complete_seq+0x2d7/0x2e0
       [<ffffffff81257c2d>] blk_insert_flush+0x1dd/0x210
       [<ffffffff8124ec59>] __elv_add_request+0x1f9/0x320
       [<ffffffff81250681>] ? blk_account_io_start+0x111/0x190
       [<ffffffff81253a4b>] blk_queue_bio+0x25b/0x330
       [<ffffffffa0020bf5>] dm_request+0x35/0x40 [dm_mod]
       [<ffffffff812530c0>] generic_make_request+0xc0/0x100
       [<ffffffff81253173>] submit_bio+0x73/0x140
       [<ffffffff811becdd>] submit_bio_wait+0x5d/0x80
       [<ffffffff81257528>] blkdev_issue_flush+0x78/0xa0
       [<ffffffff811c1f6f>] blkdev_fsync+0x3f/0x60
       [<ffffffff811b7fde>] vfs_fsync_range+0x1e/0x20
       [<ffffffff811b7ffc>] vfs_fsync+0x1c/0x20
       [<ffffffff811b81f1>] do_fsync+0x41/0x80
       [<ffffffff8118874e>] ? SyS_lseek+0x7e/0x80
       [<ffffffff811b8260>] SyS_fsync+0x10/0x20
       [<ffffffff8154c2d2>] system_call_fastpath+0x16/0x1b
      
      Fix this by moving the ->flush_rq allocation from blk_init_queue_node()
      to blk_init_allocated_queue().  blk_init_queue_node() also calls
      blk_init_allocated_queue() so this change is functionality equivalent
      for all blk_init_queue_node() callers.
      Reported-by: NHannes Reinecke <hare@suse.de>
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7982e90c
  12. 07 3月, 2014 1 次提交
  13. 06 3月, 2014 1 次提交
    • R
      blktrace: fix accounting of partially completed requests · af5040da
      Roman Pen 提交于
      trace_block_rq_complete does not take into account that request can
      be partially completed, so we can get the following incorrect output
      of blkparser:
      
        C   R 232 + 240 [0]
        C   R 240 + 232 [0]
        C   R 248 + 224 [0]
        C   R 256 + 216 [0]
      
      but should be:
      
        C   R 232 + 8 [0]
        C   R 240 + 8 [0]
        C   R 248 + 8 [0]
        C   R 256 + 8 [0]
      
      Also, the whole output summary statistics of completed requests and
      final throughput will be incorrect.
      
      This patch takes into account real completion size of the request and
      fixes wrong completion accounting.
      Signed-off-by: NRoman Pen <r.peniaev@gmail.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: linux-kernel@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      af5040da
  14. 04 3月, 2014 1 次提交
    • M
      rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock · 2a26ebef
      Mike Galbraith 提交于
      [  365.164040] BUG: sleeping function called from invalid context at kernel/rtmutex.c:674
      [  365.164041] in_atomic(): 1, irqs_disabled(): 1, pid: 26, name: migration/1
      [  365.164043] no locks held by migration/1/26.
      [  365.164044] irq event stamp: 6648
      [  365.164056] hardirqs last  enabled at (6647): [<ffffffff8153d377>] restore_args+0x0/0x30
      [  365.164062] hardirqs last disabled at (6648): [<ffffffff810ed98d>] multi_cpu_stop+0x9d/0x120
      [  365.164070] softirqs last  enabled at (0): [<ffffffff810543bc>] copy_process.part.28+0x6fc/0x1920
      [  365.164072] softirqs last disabled at (0): [<          (null)>]           (null)
      [  365.164076] CPU: 1 PID: 26 Comm: migration/1 Tainted: GF           N  3.12.12-rt19-0.gcb6c4a2-rt #3
      [  365.164078] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
      [  365.164091]  0000000000000001 ffff880a42ea7c30 ffffffff815367e6 ffffffff81a086c0
      [  365.164099]  ffff880a42ea7c40 ffffffff8108919c ffff880a42ea7c60 ffffffff8153c24f
      [  365.164107]  ffff880a42ea91f0 00000000ffffffe1 ffff880a42ea7c88 ffffffff81297ec0
      [  365.164108] Call Trace:
      [  365.164119]  [<ffffffff810060b1>] try_stack_unwind+0x191/0x1a0
      [  365.164127]  [<ffffffff81004872>] dump_trace+0x92/0x360
      [  365.164133]  [<ffffffff81006108>] show_trace_log_lvl+0x48/0x60
      [  365.164138]  [<ffffffff81004c18>] show_stack_log_lvl+0xd8/0x1d0
      [  365.164143]  [<ffffffff81006160>] show_stack+0x20/0x50
      [  365.164153]  [<ffffffff815367e6>] dump_stack+0x54/0x9a
      [  365.164163]  [<ffffffff8108919c>] __might_sleep+0xfc/0x140
      [  365.164173]  [<ffffffff8153c24f>] rt_spin_lock+0x1f/0x70
      [  365.164182]  [<ffffffff81297ec0>] blk_mq_main_cpu_notify+0x20/0x70
      [  365.164191]  [<ffffffff81540a1c>] notifier_call_chain+0x4c/0x70
      [  365.164201]  [<ffffffff81083499>] __raw_notifier_call_chain+0x9/0x10
      [  365.164207]  [<ffffffff810567be>] cpu_notify+0x1e/0x40
      [  365.164217]  [<ffffffff81525da2>] take_cpu_down+0x22/0x40
      [  365.164223]  [<ffffffff810ed9c6>] multi_cpu_stop+0xd6/0x120
      [  365.164229]  [<ffffffff810edd97>] cpu_stopper_thread+0xd7/0x1e0
      [  365.164235]  [<ffffffff810863a3>] smpboot_thread_fn+0x203/0x380
      [  365.164241]  [<ffffffff8107cbf8>] kthread+0xc8/0xd0
      [  365.164250]  [<ffffffff8154440c>] ret_from_fork+0x7c/0xb0
      [  365.164429] smpboot: CPU 1 is now offline
      Signed-off-by: NMike Galbraith <bitbucket@online.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a26ebef
  15. 25 2月, 2014 4 次提交
    • F
      smp: Rename __smp_call_function_single() to smp_call_function_single_async() · c46fff2a
      Frederic Weisbecker 提交于
      The name __smp_call_function_single() doesn't tell much about the
      properties of this function, especially when compared to
      smp_call_function_single().
      
      The comments above the implementation are also misleading. The main
      point of this function is actually not to be able to embed the csd
      in an object. This is actually a requirement that result from the
      purpose of this function which is to raise an IPI asynchronously.
      
      As such it can be called with interrupts disabled. And this feature
      comes at the cost of the caller who then needs to serialize the
      IPIs on this csd.
      
      Lets rename the function and enhance the comments so that they reflect
      these properties.
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c46fff2a
    • F
      smp: Remove wait argument from __smp_call_function_single() · fce8ad15
      Frederic Weisbecker 提交于
      The main point of calling __smp_call_function_single() is to send
      an IPI in a pure asynchronous way. By embedding a csd in an object,
      a caller can send the IPI without waiting for a previous one to complete
      as is required by smp_call_function_single() for example. As such,
      sending this kind of IPI can be safe even when irqs are disabled.
      
      This flexibility comes at the expense of the caller who then needs to
      synchronize the csd lifecycle by himself and make sure that IPIs on a
      single csd are serialized.
      
      This is how __smp_call_function_single() works when wait = 0 and this
      usecase is relevant.
      
      Now there don't seem to be any usecase with wait = 1 that can't be
      covered by smp_call_function_single() instead, which is safer. Lets look
      at the two possible scenario:
      
      1) The user calls __smp_call_function_single(wait = 1) on a csd embedded
         in an object. It looks like a nice and convenient pattern at the first
         sight because we can then retrieve the object from the IPI handler easily.
      
         But actually it is a waste of memory space in the object since the csd
         can be allocated from the stack by smp_call_function_single(wait = 1)
         and the object can be passed an the IPI argument.
      
         Besides that, embedding the csd in an object is more error prone
         because the caller must take care of the serialization of the IPIs
         for this csd.
      
      2) The user calls __smp_call_function_single(wait = 1) on a csd that
         is allocated on the stack. It's ok but smp_call_function_single()
         can do it as well and it already takes care of the allocation on the
         stack. Again it's more simple and less error prone.
      
      Therefore, using the underscore prepend API version with wait = 1
      is a bad pattern and a sign that the caller can do safer and more
      simple.
      
      There was a single user of that which has just been converted.
      So lets remove this option to discourage further users.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fce8ad15
    • J
      block: Stop abusing rq->csd.list in blk-softirq · 6d113398
      Jan Kara 提交于
      Abusing rq->csd.list for a list of requests to complete is rather ugly.
      We use rq->queuelist instead which is much cleaner. It is safe because
      queuelist is used by the block layer only for requests waiting to be
      submitted to a device. Thus it is unused when irq reports the request IO
      is finished.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6d113398
    • J
      block: Stop abusing csd.list for fifo_time · 8b4922d3
      Jan Kara 提交于
      Block layer currently abuses rq->csd.list.next for storing fifo_time.
      That is a terrible hack and completely unnecessary as well. Union
      achieves the same space saving in a cleaner way.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8b4922d3
  16. 22 2月, 2014 3 次提交
  17. 19 2月, 2014 2 次提交