1. 02 9月, 2020 5 次提交
  2. 29 6月, 2020 1 次提交
  3. 18 3月, 2020 1 次提交
    • X
      alinux: blk: add iohang check function · 80d6ee24
      Xiaoguang Wang 提交于
      Background:
        We do not have a dependable block layer interface to determine whether
      block device has io requests which have not been completed for somewhat
      long time. Currently we have 'in_flight' interface, it counts the number
      of I/O requests that have been issued to the device driver but have
      not yet completed, and it does not include I/O requests that are in the
      queue but not yet issued to the device driver, which means it will not
      count io requests that have been stucked in block layer.
        Also say that there are steady io requests issued to device driver,
      'in_flight' maybe always non-zero, but you could not determine whether
      there is one io request which has not been completed for too long.
      
      Solution:
        To find io requests which have not been completed for too long, here
      add 3 new inferfaces:
        /sys/block/vdb/queue/hang_threshold
      If one io request's running time has been greater than this value, count
      this io as hang.
      
        /sys/block/vdb/hang
      Show read/write io requests' hang counter.
      
        /sys/kernel/debug/block/vdb/rq_hang
      Show all hang io requests's detailed info, like below:
        ffff97db96301200 {.op=WRITE, .cmd_flags=SYNC, .rq_flags=STARTED|
      ELVPRIV|IO_STAT|STATS, .state=in_flight, .tag=30, .internal_tag=169,
      .start_time_ns=140634088407, .io_start_time_ns=140634102958,
      .current_time=146497371953, .bio = ffff97db91e8e000,
      .bio_pages = { ffffd096a0602540 }, .bio = ffff97db91e8ec00,
      .bio_pages = { ffffd096a070eec0 }, .bio = ffff97db91e8f600,
      .bio_pages = { ffffd096a0424cc0 }, .bio = ffff97db91e8f300,
      .bio_pages = { ffffd096a0600a80 }}
      
      With above info, we can easily see this request's latency distribution,
      and see next patch for bio_pages's usage.
      
      Note, /sys/kernel/debug/block/vdb/rq_hang only exists in blk-mq device driver
      and needs CONFIG_BLK_DEBUG_FS enabled.
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      80d6ee24
  4. 17 1月, 2020 1 次提交
    • X
      alinux: hotfix: Add Cloud Kernel hotfix enhancement · f94e5b1a
      Xunlei Pang 提交于
      We reserve some fields beforehand for core structures prone to change,
      so that we won't hurt when extra fields have to be added for hotfix,
      thereby inceasing the success rate, we even can hot add features with
      this enhancement.
      
      After reserving, normally cache does not matter as the reserved fields
      (usually at tail) are not accessed at all.
      
      Currently involve the following structures:
          MM:
          struct zone
          struct pglist_data
          struct mm_struct
          struct vm_area_struct
          struct mem_cgroup
          struct writeback_control
      
          Block:
          struct gendisk
          struct backing_dev_info
          struct bio
          struct queue_limits
          struct request_queue
          struct blkcg
          struct blkcg_policy
          struct blk_mq_hw_ctx
          struct blk_mq_tag_set
          struct blk_mq_queue_data
          struct blk_mq_ops
          struct elevator_mq_ops
          struct inode
          struct dentry
          struct address_space
          struct block_device
          struct hd_struct
          struct bio_set
      
          Network:
          struct sk_buff
          struct sock
          struct net_device_ops
          struct xt_target
          struct dst_entry
          struct dst_ops
          struct fib_rule
      
          Scheduler:
          struct task_struct
          struct cfs_rq
          struct rq
          struct sched_statistics
          struct sched_entity
          struct signal_struct
          struct task_group
          struct cpuacct
      
          cgroup:
          struct cgroup_root
          struct cgroup_subsys_state
          struct cgroup_subsys
          struct css_set
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
      [ caspar: use SPDX-License-Identifier ]
      Signed-off-by: NCaspar Zhang <caspar@linux.alibaba.com>
      f94e5b1a
  5. 05 10月, 2019 1 次提交
    • M
      blk-mq: add callback of .cleanup_rq · 4ec3ca27
      Ming Lei 提交于
      [ Upstream commit 226b4fc75c78f9c497c5182d939101b260cfb9f3 ]
      
      SCSI maintains its own driver private data hooked off of each SCSI
      request, and the pridate data won't be freed after scsi_queue_rq()
      returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
      (e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
      fully dispatched them, due to a lower level SCSI driver's resource
      limitation identified in scsi_queue_rq(). Currently SCSI's per-request
      private data is leaked when the upper layer driver (dm-rq) frees and
      then retries these requests in response to BLK_STS_RESOURCE or
      BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
      
      This usecase is so specialized that it doesn't warrant training an
      existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
      account for freeing its driver private data -- doing so would add an
      extra branch for handling a special case that all other consumers of
      SCSI (and blk-mq) won't ever need to worry about.
      
      So the most pragmatic way forward is to delegate freeing SCSI driver
      private data to the upper layer driver (dm-rq).  Do so by adding
      new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
      from dm-rq.  A following commit will implement the .cleanup_rq() hook
      in scsi_mq_ops.
      
      Cc: Ewan D. Milne <emilne@redhat.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: <stable@vger.kernel.org>
      Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4ec3ca27
  6. 25 7月, 2018 1 次提交
  7. 09 7月, 2018 2 次提交
  8. 14 6月, 2018 1 次提交
  9. 31 5月, 2018 1 次提交
  10. 25 4月, 2018 1 次提交
  11. 10 4月, 2018 1 次提交
  12. 10 1月, 2018 2 次提交
    • T
      blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu · 05707b64
      Tejun Heo 提交于
      The RCU protection has been expanded to cover both queueing and
      completion paths making ->queue_rq_srcu a misnomer.  Rename it to
      ->srcu as suggested by Bart.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05707b64
    • T
      blk-mq: replace timeout synchronization with a RCU and generation based scheme · 1d9bd516
      Tejun Heo 提交于
      Currently, blk-mq timeout path synchronizes against the usual
      issue/completion path using a complex scheme involving atomic
      bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
      rules.  Unfortunately, it contains quite a few holes.
      
      There's a complex dancing around REQ_ATOM_STARTED and
      REQ_ATOM_COMPLETE between issue/completion and timeout paths; however,
      they don't have a synchronization point across request recycle
      instances and it isn't clear what the barriers add.
      blk_mq_check_expired() can easily read STARTED from N-2'th iteration,
      deadline from N-1'th, blk_mark_rq_complete() against Nth instance.
      
      In fact, it's pretty easy to make blk_mq_check_expired() terminate a
      later instance of a request.  If we induce 5 sec delay before
      time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
      2s, and issue back-to-back large IOs, blk-mq starts timing out
      requests spuriously pretty quickly.  Nothing actually timed out.  It
      just made the call on a recycle instance of a request and then
      terminated a later instance long after the original instance finished.
      The scenario isn't theoretical either.
      
      This patch replaces the broken synchronization mechanism with a RCU
      and generation number based one.
      
      1. Each request has a u64 generation + state value, which can be
         updated only by the request owner.  Whenever a request becomes
         in-flight, the generation number gets bumped up too.  This provides
         the basis for the timeout path to distinguish different recycle
         instances of the request.
      
         Also, marking a request in-flight and setting its deadline are
         protected with a seqcount so that the timeout path can fetch both
         values coherently.
      
      2. The timeout path fetches the generation, state and deadline.  If
         the verdict is timeout, it records the generation into a dedicated
         request abortion field and does RCU wait.
      
      3. The completion path is also protected by RCU (from the previous
         patch) and checks whether the current generation number and state
         match the abortion field.  If so, it skips completion.
      
      4. The timeout path, after RCU wait, scans requests again and
         terminates the ones whose generation and state still match the ones
         requested for abortion.
      
         By now, the timeout path knows that either the generation number
         and state changed if it lost the race or the completion will yield
         to it and can safely timeout the request.
      
      While it's more lines of code, it's conceptually simpler, doesn't
      depend on direct use of subtle memory ordering or coherence, and
      hopefully doesn't terminate the wrong instance.
      
      While this change makes REQ_ATOM_COMPLETE synchronization unnecessary
      between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't
      removed yet as it's still used in other places.  Future patches will
      move all state tracking to the new mechanism and remove all bitops in
      the hot paths.
      
      Note that this patch adds a comment explaining a race condition in
      BLK_EH_RESET_TIMER path.  The race has always been there and this
      patch doesn't change it.  It's just documenting the existing race.
      
      v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao.
          - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter.
          - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter.
      
      v3: - Fixed possible extended seqcount / u64_stats_sync read looping
            spotted by Peter.
          - MQ_RQ_IDLE was incorrectly being set in complete_request instead
            of free_request.  Fixed.
      
      v4: - Rebased on top of hctx_lock() refactoring patch.
          - Added comment explaining the use of hctx_lock() in completion path.
      
      v5: - Added comments requested by Bart.
          - Note the addition of BLK_EH_RESET_TIMER race condition in the
            commit message.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1d9bd516
  13. 11 11月, 2017 4 次提交
  14. 05 11月, 2017 1 次提交
    • M
      blk-mq: don't handle failure in .get_budget · 88022d72
      Ming Lei 提交于
      It is enough to just check if we can get the budget via .get_budget().
      And we don't need to deal with device state change in .get_budget().
      
      For SCSI, one issue to be fixed is that we have to call
      scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
      to handle the request. And it isn't enough to simply call
      blk_mq_end_request() to do that if this request is marked as
      RQF_DONTPREP.
      
      Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88022d72
  15. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  16. 01 11月, 2017 2 次提交
    • M
      blk-mq-sched: improve dispatching from sw queue · b347689f
      Ming Lei 提交于
      SCSI devices use host-wide tagset, and the shared driver tag space is
      often quite big. However, there is also a queue depth for each lun(
      .cmd_per_lun), which is often small, for example, on both lpfc and
      qla2xxx, .cmd_per_lun is just 3.
      
      So lots of requests may stay in sw queue, and we always flush all
      belonging to same hw queue and dispatch them all to driver.
      Unfortunately it is easy to cause queue busy because of the small
      .cmd_per_lun.  Once these requests are flushed out, they have to stay in
      hctx->dispatch, and no bio merge can happen on these requests, and
      sequential IO performance is harmed.
      
      This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request
      from a sw queue, so that we can dispatch them in scheduler's way. We can
      then avoid dequeueing too many requests from sw queue, since we don't
      flush ->dispatch completely.
      
      This patch improves dispatching from sw queue by using the .get_budget
      and .put_budget callbacks.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b347689f
    • M
      blk-mq: introduce .get_budget and .put_budget in blk_mq_ops · de148297
      Ming Lei 提交于
      For SCSI devices, there is often a per-request-queue depth, which needs
      to be respected before queuing one request.
      
      Currently blk-mq always dequeues the request first, then calls
      .queue_rq() to dispatch the request to lld. One obvious issue with this
      approach is that I/O merging may not be successful, because when the
      per-request-queue depth can't be respected, .queue_rq() has to return
      BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
      list. This means it never gets a chance to be merged with other IO.
      
      This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
      then we can try to get reserved budget first before dequeuing request.
      If the budget for queueing I/O can't be satisfied, we don't need to
      dequeue request at all. Hence the request can be left in the IO
      scheduler queue, for more merging opportunities.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de148297
  17. 19 10月, 2017 2 次提交
  18. 18 8月, 2017 1 次提交
  19. 22 6月, 2017 1 次提交
  20. 21 6月, 2017 3 次提交
  21. 20 6月, 2017 1 次提交
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  22. 19 6月, 2017 5 次提交
  23. 09 6月, 2017 1 次提交