1. 14 2月, 2018 1 次提交
  2. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  3. 08 2月, 2018 1 次提交
    • P
      block, bfq: add requeue-request hook · a7877390
      Paolo Valente 提交于
      Commit 'a6a252e6 ("blk-mq-sched: decide how to handle flush rq via
      RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
      be re-inserted into the active I/O scheduler for that device. As a
      consequence, I/O schedulers may get the same request inserted again,
      even several times, without a finish_request invoked on that request
      before each re-insertion.
      
      This fact is the cause of the failure reported in [1]. For an I/O
      scheduler, every re-insertion of the same re-prepared request is
      equivalent to the insertion of a new request. For schedulers like
      mq-deadline or kyber, this fact causes no harm. In contrast, it
      confuses a stateful scheduler like BFQ, which keeps state for an I/O
      request, until the finish_request hook is invoked on the request. In
      particular, BFQ may get stuck, waiting forever for the number of
      request dispatches, of the same request, to be balanced by an equal
      number of request completions (while there will be one completion for
      that request). In this state, BFQ may refuse to serve I/O requests
      from other bfq_queues. The hang reported in [1] then follows.
      
      However, the above re-prepared requests undergo a requeue, thus the
      requeue_request hook of the active elevator is invoked for these
      requests, if set. This commit then addresses the above issue by
      properly implementing the hook requeue_request in BFQ.
      
      [1] https://marc.info/?l=linux-block&m=151211117608676Reported-by: NIvan Kozik <ivan@ludios.org>
      Reported-by: NAlban Browaeys <alban.browaeys@gmail.com>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NSerena Ziviani <ziviani.serena@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a7877390
  4. 07 2月, 2018 2 次提交
    • H
      block: Add should_fail_bio() for bpf error injection · 30abb3a6
      Howard McLauchlan 提交于
      The classic error injection mechanism, should_fail_request() does not
      support use cases where more information is required (from the entire
      struct bio, for example).
      
      To that end, this patch introduces should_fail_bio(), which calls
      should_fail_request() under the hood but provides a convenient
      place for kprobes to hook into if they require the entire struct bio.
      This patch also replaces some existing calls to should_fail_request()
      with should_fail_bio() with no degradation in performance.
      Signed-off-by: NHoward McLauchlan <hmclauchlan@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      30abb3a6
    • J
      blk-wbt: account flush requests correctly · 5235553d
      Jens Axboe 提交于
      Mikulas reported a workload that saw bad performance, and figured
      out what it was due to various other types of requests being
      accounted as reads. Flush requests, for instance. Due to the
      high latency of those, we heavily throttle the writes to keep
      the latencies in balance. But they really should be accounted
      as writes.
      
      Fix this by checking the exact type of the request. If it's a
      read, account as a read, if it's a write or a flush, account
      as a write. Any other request we disregard. Previously everything
      would have been mistakenly accounted as reads.
      Reported-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5235553d
  5. 02 2月, 2018 2 次提交
    • K
      bea99a50
    • J
      blk-mq: fix discard merge with scheduler attached · 445251d0
      Jens Axboe 提交于
      I ran into an issue on my laptop that triggered a bug on the
      discard path:
      
      WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
       Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
       CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
       Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
       RIP: 0010:nvme_setup_cmd+0x3d3/0x430
       RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
       RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
       RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
       RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
       R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
       R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
       FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        nvme_queue_rq+0x40/0xa00
        ? __sbitmap_queue_get+0x24/0x90
        ? blk_mq_get_tag+0xa3/0x250
        ? wait_woken+0x80/0x80
        ? blk_mq_get_driver_tag+0x97/0xf0
        blk_mq_dispatch_rq_list+0x7b/0x4a0
        ? deadline_remove_request+0x49/0xb0
        blk_mq_do_dispatch_sched+0x4f/0xc0
        blk_mq_sched_dispatch_requests+0x106/0x170
        __blk_mq_run_hw_queue+0x53/0xa0
        __blk_mq_delay_run_hw_queue+0x83/0xa0
        blk_mq_run_hw_queue+0x6c/0xd0
        blk_mq_sched_insert_request+0x96/0x140
        __blk_mq_try_issue_directly+0x3d/0x190
        blk_mq_try_issue_directly+0x30/0x70
        blk_mq_make_request+0x1a4/0x6a0
        generic_make_request+0xfd/0x2f0
        ? submit_bio+0x5c/0x110
        submit_bio+0x5c/0x110
        ? __blkdev_issue_discard+0x152/0x200
        submit_bio_wait+0x43/0x60
        ext4_process_freed_data+0x1cd/0x440
        ? account_page_dirtied+0xe2/0x1a0
        ext4_journal_commit_callback+0x4a/0xc0
        jbd2_journal_commit_transaction+0x17e2/0x19e0
        ? kjournald2+0xb0/0x250
        kjournald2+0xb0/0x250
        ? wait_woken+0x80/0x80
        ? commit_timeout+0x10/0x10
        kthread+0x111/0x130
        ? kthread_create_worker_on_cpu+0x50/0x50
        ? do_group_exit+0x3a/0xa0
        ret_from_fork+0x1f/0x30
       Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff
       ---[ end trace 50d361cc444506c8 ]---
       print_req_error: I/O error, dev nvme0n1, sector 847167488
      
      Decoding the assembly, the request claims to have 0xffff segments,
      while nvme counts two. This turns out to be because we don't check
      for a data carrying request on the mq scheduler path, and since
      blk_phys_contig_segment() returns true for a non-data request,
      we decrement the initial segment count of 0 and end up with
      0xffff in the unsigned short.
      
      There are a few issues here:
      
      1) We should initialize the segment count for a discard to 1.
      2) The discard merging is currently using the data limits for
         segments and sectors.
      
      Fix this up by having attempt_merge() correctly identify the
      request, and by initializing the segment count correctly
      for discards.
      
      This can only be triggered with mq-deadline on discard capable
      devices right now, which isn't a common configuration.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      445251d0
  6. 31 1月, 2018 1 次提交
    • M
      blk-mq: introduce BLK_STS_DEV_RESOURCE · 86ff7c2a
      Ming Lei 提交于
      This status is returned from driver to block layer if device related
      resource is unavailable, but driver can guarantee that IO dispatch
      will be triggered in future when the resource is available.
      
      Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
      returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
      a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
      3 ms because both scsi-mq and nvmefc are using that magic value.
      
      If a driver can make sure there is in-flight IO, it is safe to return
      BLK_STS_DEV_RESOURCE because:
      
      1) If all in-flight IOs complete before examining SCHED_RESTART in
      blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
      is run immediately in this case by blk_mq_dispatch_rq_list();
      
      2) if there is any in-flight IO after/when examining SCHED_RESTART
      in blk_mq_dispatch_rq_list():
      - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
      - otherwise, this request will be dispatched after any in-flight IO is
        completed via blk_mq_sched_restart()
      
      3) if SCHED_RESTART is set concurently in context because of
      BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
      cases and make sure IO hang can be avoided.
      
      One invariant is that queue will be rerun if SCHED_RESTART is set.
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86ff7c2a
  7. 25 1月, 2018 2 次提交
  8. 24 1月, 2018 1 次提交
  9. 20 1月, 2018 4 次提交
  10. 19 1月, 2018 7 次提交
  11. 18 1月, 2018 9 次提交
    • P
      block, bfq: limit sectors served with interactive weight raising · 8a8747dc
      Paolo Valente 提交于
      To maximise responsiveness, BFQ raises the weight, and performs device
      idling, for bfq_queues associated with processes deemed as
      interactive. In particular, weight raising has a maximum duration,
      equal to the time needed to start a large application. If a
      weight-raised process goes on doing I/O beyond this maximum duration,
      it loses weight-raising.
      
      This mechanism is evidently vulnerable to the following false
      positives: I/O-bound applications that will go on doing I/O for much
      longer than the duration of weight-raising. These applications have
      basically no benefit from being weight-raised at the beginning of
      their I/O. On the opposite end, while being weight-raised, these
      applications
      a) unjustly steal throughput to applications that may truly need
      low latency;
      b) make BFQ uselessly perform device idling; device idling results
      in loss of device throughput with most flash-based storage, and may
      increase latencies when used purposelessly.
      
      This commit adds a countermeasure to reduce both the above
      problems. To introduce this countermeasure, we provide the following
      extra piece of information (full details in the comments added by this
      commit). During the start-up of the large application used as a
      reference to set the duration of weight-raising, involved processes
      transfer at most ~110K sectors each. Accordingly, a process initially
      deemed as interactive has no right to be weight-raised any longer,
      once transferred 110K sectors or more.
      
      Basing on this consideration, this commit early-ends weight-raising
      for a bfq_queue if the latter happens to have received an amount of
      service at least equal to 110K sectors (actually, a little bit more,
      to keep a safety margin). I/O-bound applications that reach a high
      throughput, such as file copy, get to this threshold much before the
      allowed weight-raising period finishes. Thus this early ending of
      weight-raising reduces the amount of time during which these
      applications cause the problems described above.
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a8747dc
    • P
      block, bfq: limit tags for writes and async I/O · a52a69ea
      Paolo Valente 提交于
      Asynchronous I/O can easily starve synchronous I/O (both sync reads
      and sync writes), by consuming all request tags. Similarly, storms of
      synchronous writes, such as those that sync(2) may trigger, can starve
      synchronous reads. In their turn, these two problems may also cause
      BFQ to loose control on latency for interactive and soft real-time
      applications. For example, on a PLEXTOR PX-256M5S SSD, LibreOffice
      Writer takes 0.6 seconds to start if the device is idle, but it takes
      more than 45 seconds (!) if there are sequential writes in the
      background.
      
      This commit addresses this issue by limiting the maximum percentage of
      tags that asynchronous I/O requests and synchronous write requests can
      consume. In particular, this commit grants a higher threshold to
      synchronous writes, to prevent the latter from being starved by
      asynchronous I/O.
      
      According to the above test, LibreOffice Writer now starts in about
      1.2 seconds on average, regardless of the background workload, and
      apart from some rare outlier. To check this improvement, run, e.g.,
      sudo ./comm_startup_lat.sh bfq 5 5 seq 10 "lowriter --terminate_after_init"
      for the comm_startup_lat benchmark in the S suite [1].
      
      [1] https://github.com/Algodev-github/STested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a52a69ea
    • M
      blk-mq: don't dispatch request in blk_mq_request_direct_issue if queue is busy · 23d4ee19
      Ming Lei 提交于
      If we run into blk_mq_request_direct_issue(), when queue is busy, we
      don't want to dispatch this request into hctx->dispatch_list, and
      what we need to do is to return the queue busy info to caller, so
      that caller can deal with it well.
      
      Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
      Reported-by: NLaurence Oberman <loberman@redhat.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      23d4ee19
    • B
      block: Fix __bio_integrity_endio() documentation · de99a346
      Bart Van Assche 提交于
      Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de99a346
    • M
      blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request · 9e97d295
      Mike Snitzer 提交于
      After commit:
      
      923218f6 ("blk-mq: don't allocate driver tag upfront for flush rq")
      
      we no longer use the 'can_block' argument in
      blk_mq_sched_insert_request(). Kill it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      
      Added actual commit message as to why it's being removed.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9e97d295
    • M
      blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback · 396eaf21
      Ming Lei 提交于
      blk_insert_cloned_request() is called in the fast path of a dm-rq driver
      (e.g. blk-mq request-based DM mpath).  blk_insert_cloned_request() uses
      blk_mq_request_bypass_insert() to directly append the request to the
      blk-mq hctx->dispatch_list of the underlying queue.
      
      1) This way isn't efficient enough because the hctx spinlock is always
      used.
      
      2) With blk_insert_cloned_request(), we completely bypass underlying
      queue's elevator and depend on the upper-level dm-rq driver's elevator
      to schedule IO.  But dm-rq currently can't get the underlying queue's
      dispatch feedback at all.  Without knowing whether a request was issued
      or not (e.g. due to underlying queue being busy) the dm-rq elevator will
      not be able to provide effective IO merging (as a side-effect of dm-rq
      currently blindly destaging a request from its elevator only to requeue
      it after a delay, which kills any opportunity for merging).  This
      obviously causes very bad sequential IO performance.
      
      Fix this by updating blk_insert_cloned_request() to use
      blk_mq_request_direct_issue().  blk_mq_request_direct_issue() allows a
      request to be issued directly to the underlying queue and returns the
      dispatch feedback (blk_status_t).  If blk_mq_request_direct_issue()
      returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE
      to _not_ destage the request.  Whereby preserving the opportunity to
      merge IO.
      
      With this, request-based DM's blk-mq sequential IO performance is vastly
      improved (as much as 3X in mpath/virtio-scsi testing).
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      [blk-mq.c changes heavily influenced by Ming Lei's initial solution, but
      they were refactored to make them less fragile and easier to read/review]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      396eaf21
    • M
      blk-mq: factor out a few helpers from __blk_mq_try_issue_directly · 0f95549c
      Mike Snitzer 提交于
      No functional change.  Just makes code flow more logically.
      
      In following commit, __blk_mq_try_issue_directly() will be used to
      return the dispatch result (blk_status_t) to DM.  DM needs this
      information to improve IO merging.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f95549c
    • M
      blk-mq: turn WARN_ON in __blk_mq_run_hw_queue into printk · 7df938fb
      Ming Lei 提交于
      We know this WARN_ON is harmless and in reality it may be trigged,
      so convert it to printk() and dump_stack() to avoid to confusing
      people.
      
      Also add comment about two releated races here.
      
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7df938fb
    • M
      blk-mq: make sure hctx->next_cpu is set correctly · 7bed4595
      Ming Lei 提交于
      When hctx->next_cpu is set from possible online CPUs, there is one
      race in which hctx->next_cpu may be set as >= nr_cpu_ids, and finally
      break workqueue.
      
      The race can be triggered in the following two sitations:
      
      1) when one CPU is becoming DEAD, blk_mq_hctx_notify_dead() is called
      to dispatch requests from the DEAD cpu context, but at that
      time, this DEAD CPU has been cleared from 'cpu_online_mask', so all
      CPUs in hctx->cpumask may become offline, and cause hctx->next_cpu set
      a bad value.
      
      2) blk_mq_delay_run_hw_queue() is called from CPU B, and found the queue
      should be run on the other CPU A, then CPU A may become offline at the
      same time and all CPUs in hctx->cpumask become offline.
      
      This patch deals with this issue by re-selecting next CPU, and making
      sure it is set correctly.
      
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-by: N"jianchao.wang" <jianchao.w.wang@oracle.com>
      Tested-by: N"jianchao.wang" <jianchao.w.wang@oracle.com>
      Fixes: 20e4d813 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7bed4595
  12. 15 1月, 2018 5 次提交
    • D
      blk_rq_map_user_iov: fix error override · 69e0927b
      Douglas Gilbert 提交于
      During stress tests by syzkaller on the sg driver the block layer
      infrequently returns EINVAL. Closer inspection shows the block
      layer was trying to return ENOMEM (which is much more
      understandable) but for some reason overroad that useful error.
      
      Patch below does not show this (unchanged) line:
         ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy);
      That 'ret' was being overridden when that function failed.
      Signed-off-by: NDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      69e0927b
    • M
      block: allow gendisk's request_queue registration to be deferred · fa70d2e2
      Mike Snitzer 提交于
      Since I can remember DM has forced the block layer to allow the
      allocation and initialization of the request_queue to be distinct
      operations.  Reason for this is block/genhd.c:add_disk() has requires
      that the request_queue (and associated bdi) be tied to the gendisk
      before add_disk() is called -- because add_disk() also deals with
      exposing the request_queue via blk_register_queue().
      
      DM's dynamic creation of arbitrary device types (and associated
      request_queue types) requires the DM device's gendisk be available so
      that DM table loads can establish a master/slave relationship with
      subordinate devices that are referenced by loaded DM tables -- using
      bd_link_disk_holder().  But until these DM tables, and their associated
      subordinate devices, are known DM cannot know what type of request_queue
      it needs -- nor what its queue_limits should be.
      
      This chicken and egg scenario has created all manner of problems for DM
      and, at times, the block layer.
      
      Summary of changes:
      
      - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
        that drivers may use to add a disk without also calling
        blk_register_queue().  Driver must call blk_register_queue() once its
        request_queue is fully initialized.
      
      - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
        is not set.  It won't be set if driver used add_disk_no_queue_reg()
        but driver encounters an error and must del_gendisk() before calling
        blk_register_queue().
      
      - Export blk_register_queue().
      
      These changes allow DM to use add_disk_no_queue_reg() to anchor its
      gendisk as the "master" for master/slave relationships DM must establish
      with subordinate devices referenced in DM tables that get loaded.  Once
      all "slave" devices for a DM device are known its request_queue can be
      properly initialized and then advertised via sysfs -- important
      improvement being that no request_queue resource initialization
      performed by blk_register_queue() is missed for DM devices anymore.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fa70d2e2
    • M
      block: properly protect the 'queue' kobj in blk_unregister_queue · 667257e8
      Mike Snitzer 提交于
      The original commit e9a823fb (block: fix warning when I/O elevator
      is changed as request_queue is being removed) is pretty conflated.
      "conflated" because the resource being protected by q->sysfs_lock isn't
      the queue_flags (it is the 'queue' kobj).
      
      q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
      from racing with blk_unregister_queue():
      1) By holding q->sysfs_lock first, __elevator_change() can complete
      before a racing blk_unregister_queue().
      2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
      in case elv_iosched_store() loses the race with blk_unregister_queue(),
      it needs a way to know the 'queue' kobj isn't there.
      
      Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
      held until after the 'queue' kobj is removed.
      
      To do so blk_mq_unregister_dev() must not also take q->sysfs_lock.  So
      rename __blk_mq_unregister_dev() to blk_mq_unregister_dev().
      
      Also, blk_unregister_queue() should use q->queue_lock to protect against
      any concurrent writes to q->queue_flags -- even though chances are the
      queue is being cleaned up so no concurrent writes are likely.
      
      Fixes: e9a823fb ("block: fix warning when I/O elevator is changed as request_queue is being removed")
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      667257e8
    • M
      block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN · bc8d062c
      Mike Snitzer 提交于
      device_add_disk() will only call bdi_register_owner() if
      !GENHD_FL_HIDDEN, so it follows that del_gendisk() should only call
      bdi_unregister() if !GENHD_FL_HIDDEN.
      
      Found with code inspection.  bdi_unregister() won't do any harm if
      bdi_register_owner() wasn't used but best to avoid the unnecessary
      call to bdi_unregister().
      
      Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN")
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bc8d062c
    • J
      blk-mq: fix bad clear of RQF_MQ_INFLIGHT in blk_mq_ct_ctx_init() · bf9ae8c5
      Jens Axboe 提交于
      A previous commit moved the clearing of rq->rq_flags later,
      but we may have already set RQF_MQ_INFLIGHT when that happens.
      Ensure that we correctly initialize rq->rq_flags to the
      right value.
      
      This is based on an original fix by Ming, just rewritten to not
      require a conditional.
      
      Fixes: 7c3fb70f ("block: rearrange a few request fields for better cache layout")
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf9ae8c5
  13. 13 1月, 2018 2 次提交
  14. 12 1月, 2018 1 次提交
  15. 11 1月, 2018 1 次提交