1. 28 3月, 2017 18 次提交
    • S
      blk-throttle: add a mechanism to estimate IO latency · b9147dd1
      Shaohua Li 提交于
      User configures latency target, but the latency threshold for each
      request size isn't fixed. For a SSD, the IO latency highly depends on
      request size. To calculate latency threshold, we sample some data, eg,
      average latency for request size 4k, 8k, 16k, 32k .. 1M. The latency
      threshold of each request size will be the sample latency (I'll call it
      base latency) plus latency target. For example, the base latency for
      request size 4k is 80us and user configures latency target 60us. The 4k
      latency threshold will be 80 + 60 = 140us.
      
      To sample data, we calculate the order base 2 of rounded up IO sectors.
      If the IO size is bigger than 1M, it will be accounted as 1M. Since the
      calculation does round up, the base latency will be slightly smaller
      than actual value. Also if there isn't any IO dispatched for a specific
      IO size, we will use the base latency of smaller IO size for this IO
      size.
      
      But we shouldn't sample data at any time. The base latency is supposed
      to be latency where disk isn't congested, because we use latency
      threshold to schedule IOs between cgroups. If disk is congested, the
      latency is higher, using it for scheduling is meaningless. Hence we only
      do the sampling when block throttling is in the LOW limit, with
      assumption disk isn't congested in such state. If the assumption isn't
      true, eg, low limit is too high, calculated latency threshold will be
      higher.
      
      Hard disk is completely different. Latency depends on spindle seek
      instead of request size. Currently this feature is SSD only, we probably
      can use a fixed threshold like 4ms for hard disk though.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b9147dd1
    • S
      block: track request size in blk_issue_stat · 88eeca49
      Shaohua Li 提交于
      Currently there is no way to know the request size when the request is
      finished. Next patch will need this info. We could add extra field to
      record the size, but blk_issue_stat has enough space to record it, so
      this patch just overloads blk_issue_stat. With this, we will have 49bits
      to track time, which still is very long time.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      88eeca49
    • S
      blk-throttle: add interface for per-cgroup target latency · ec80991d
      Shaohua Li 提交于
      Here we introduce per-cgroup latency target. The target determines how a
      cgroup can afford latency increasement. We will use the target latency
      to calculate a threshold and use it to schedule IO for cgroups. If a
      cgroup's bandwidth is below its low limit but its average latency is
      below the threshold, other cgroups can safely dispatch more IO even
      their bandwidth is higher than their low limits. On the other hand, if
      the first cgroup's latency is higher than the threshold, other cgroups
      are throttled to their low limits. So the target latency determines how
      we efficiently utilize free disk resource without sacifice of worload's
      IO latency.
      
      For example, assume 4k IO average latency is 50us when disk isn't
      congested. A cgroup sets the target latency to 30us. Then the cgroup can
      accept 50+30=80us IO latency. If the cgroupt's average IO latency is
      90us and its bandwidth is below low limit, other cgroups are throttled
      to their low limit. If the cgroup's average IO latency is 60us, other
      cgroups are allowed to dispatch more IO. When other cgroups dispatch
      more IO, the first cgroup's IO latency will increase. If it increases to
      81us, we then throttle other cgroups.
      
      User will configure the interface in this way:
      echo "8:16 rbps=2097152 wbps=max latency=100 idle=200" > io.low
      
      latency is in microsecond unit
      
      By default, latency target is 0, which means to guarantee IO latency.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ec80991d
    • S
      blk-throttle: ignore idle cgroup limit · fa6fb5aa
      Shaohua Li 提交于
      Last patch introduces a way to detect idle cgroup. We use it to make
      upgrade/downgrade decision. And the new algorithm can detect completely
      idle cgroup too, so we can delete the corresponding code.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fa6fb5aa
    • S
      blk-throttle: add interface to configure idle time threshold · ada75b6e
      Shaohua Li 提交于
      Add interface to configure the threshold. The io.low interface will
      like:
      echo "8:16 rbps=2097152 wbps=max idle=2000" > io.low
      
      idle is in microsecond unit.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ada75b6e
    • S
      blk-throttle: add a simple idle detection · 9e234eea
      Shaohua Li 提交于
      A cgroup gets assigned a low limit, but the cgroup could never dispatch
      enough IO to cross the low limit. In such case, the queue state machine
      will remain in LIMIT_LOW state and all other cgroups will be throttled
      according to low limit. This is unfair for other cgroups. We should
      treat the cgroup idle and upgrade the state machine to lower state.
      
      We also have a downgrade logic. If the state machine upgrades because of
      cgroup idle (real idle), the state machine will downgrade soon as the
      cgroup is below its low limit. This isn't what we want. A more
      complicated case is cgroup isn't idle when queue is in LIMIT_LOW. But
      when queue gets upgraded to lower state, other cgroups could dispatch
      more IO and this cgroup can't dispatch enough IO, so the cgroup is below
      its low limit and looks like idle (fake idle). In this case, the queue
      should downgrade soon. The key to determine if we should do downgrade is
      to detect if cgroup is truely idle.
      
      Unfortunately it's very hard to determine if a cgroup is real idle. This
      patch uses the 'think time check' idea from CFQ for the purpose. Please
      note, the idea doesn't work for all workloads. For example, a workload
      with io depth 8 has disk utilization 100%, hence think time is 0, eg,
      not idle. But the workload can run higher bandwidth with io depth 16.
      Compared to io depth 16, the io depth 8 workload is idle. We use the
      idea to roughly determine if a cgroup is idle.
      
      We treat a cgroup idle if its think time is above a threshold (by
      default 1ms for SSD and 100ms for HD). The idea is think time above the
      threshold will start to harm performance. HD is much slower so a longer
      think time is ok.
      
      The patch (and the latter patches) uses 'unsigned long' to track time.
      We convert 'ns' to 'us' with 'ns >> 10'. This is fast but loses
      precision, should not a big deal.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9e234eea
    • S
      blk-throttle: make bandwidth change smooth · 7394e31f
      Shaohua Li 提交于
      When cgroups all reach low limit, cgroups can dispatch more IO. This
      could make some cgroups dispatch more IO but others not, and even some
      cgroups could dispatch less IO than their low limit. For example, cg1
      low limit 10MB/s, cg2 limit 80MB/s, assume disk maximum bandwidth is
      120M/s for the workload. Their bps could something like this:
      
      cg1/cg2 bps: T1: 10/80 -> T2: 60/60 -> T3: 10/80
      
      At T1, all cgroups reach low limit, so they can dispatch more IO later.
      Then cg1 dispatch more IO and cg2 has no room to dispatch enough IO. At
      T2, cg2 only dispatches 60M/s. Since We detect cg2 dispatches less IO
      than its low limit 80M/s, we downgrade the queue from LIMIT_MAX to
      LIMIT_LOW, then all cgroups are throttled to their low limit (T3). cg2
      will have bandwidth below its low limit at most time.
      
      The big problem here is we don't know the maximum bandwidth of the
      workload, so we can't make smart decision to avoid the situation. This
      patch makes cgroup bandwidth change smooth. After disk upgrades from
      LIMIT_LOW to LIMIT_MAX, we don't allow cgroups use all bandwidth upto
      their max limit immediately. Their bandwidth limit will be increased
      gradually to avoid above situation. So above example will became
      something like:
      
      cg1/cg2 bps: 10/80 -> 15/105 -> 20/100 -> 25/95 -> 30/90 -> 35/85 -> 40/80
      -> 45/75 -> 22/98
      
      In this way cgroups bandwidth will be above their limit in majority
      time, this still doesn't fully utilize disk bandwidth, but that's
      something we pay for sharing.
      
      Scale up is linear. The limit scales up 1/2 .low limit every
      throtl_slice after upgrade. The scale up will stop if the adjusted limit
      hits .max limit. Scale down is exponential. We cut the scale value half
      if a cgroup doesn't hit its .low limit. If the scale becomes 0, we then
      fully downgrade the queue to LIMIT_LOW state.
      
      Note this doesn't completely avoid cgroup running under its low limit.
      The best way to guarantee cgroup doesn't run under its limit is to set
      max limit. For example, if we set cg1 max limit to 40, cg2 will never
      run under its low limit.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7394e31f
    • S
      blk-throttle: detect completed idle cgroup · aec24246
      Shaohua Li 提交于
      cgroup could be assigned a limit, but doesn't dispatch enough IO, eg the
      cgroup is idle. When this happens, the cgroup doesn't hit its limit, so
      we can't move the state machine to higher level and all cgroups will be
      throttled to their lower limit, so we waste bandwidth. Detecting idle
      cgroup is hard. This patch handles a simple case, a cgroup doesn't
      dispatch any IO. We ignore such cgroup's limit, so other cgroups can use
      the bandwidth.
      
      Please note this will be replaced with a more sophisticated algorithm
      later, but this demonstrates the idea how we handle idle cgroups, so I
      leave it here.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      aec24246
    • S
      blk-throttle: choose a small throtl_slice for SSD · d61fcfa4
      Shaohua Li 提交于
      The throtl_slice is 100ms by default. This is a long time for SSD, a lot
      of IO can run. To make cgroups have smoother throughput, we choose a
      small value (20ms) for SSD.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      d61fcfa4
    • S
      blk-throttle: make throtl_slice tunable · 297e3d85
      Shaohua Li 提交于
      throtl_slice is important for blk-throttling. It's called slice
      internally but it really is a time window blk-throttling samples data.
      blk-throttling will make decision based on the samplings. An example is
      bandwidth measurement. A cgroup's bandwidth is measured in the time
      interval of throtl_slice.
      
      A small throtl_slice meanse cgroups have smoother throughput but burn
      more CPUs. It has 100ms default value, which is not appropriate for all
      disks. A fast SSD can dispatch a lot of IOs in 100ms. This patch makes
      it tunable.
      
      Since throtl_slice isn't a time slice, the sysfs name
      'throttle_sample_time' reflects its character better.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      297e3d85
    • S
      blk-throttle: make sure expire time isn't too big · 06cceedc
      Shaohua Li 提交于
      cgroup could be throttled to a limit but when all cgroups cross high
      limit, queue enters a higher state and so the group should be throttled
      to a higher limit. It's possible the cgroup is sleeping because of
      throttle and other cgroups don't dispatch IO any more. In this case,
      nobody can trigger current downgrade/upgrade logic. To fix this issue,
      we could either set up a timer to wakeup the cgroup if other cgroups are
      idle or make sure this cgroup doesn't sleep too long. Setting up a timer
      means we must change the timer very frequently. This patch chooses the
      latter. Making cgroup sleep time not too big wouldn't change cgroup
      bps/iops, but could make it wakeup more frequently, which isn't a big
      issue because throtl_slice * 8 is already quite big.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      06cceedc
    • S
      blk-throttle: add downgrade logic · 3f0abd80
      Shaohua Li 提交于
      When queue state machine is in LIMIT_MAX state, but a cgroup is below
      its low limit for some time, the queue should be downgraded to lower
      state as one cgroup's low limit isn't met.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      3f0abd80
    • S
      blk-throttle: add upgrade logic for LIMIT_LOW state · c79892c5
      Shaohua Li 提交于
      When queue is in LIMIT_LOW state and all cgroups with low limit cross
      the bps/iops limitation, we will upgrade queue's state to
      LIMIT_MAX. To determine if a cgroup exceeds its limitation, we check if
      the cgroup has pending request. Since cgroup is throttled according to
      the limit, pending request means the cgroup reaches the limit.
      
      If a cgroup has limit set for both read and write, we consider the
      combination of them for upgrade. The reason is read IO and write IO can
      interfere with each other. If we do the upgrade based in one direction
      IO, the other direction IO could be severly harmed.
      
      For a cgroup hierarchy, there are two cases. Children has lower low
      limit than parent. Parent's low limit is meaningless. If children's
      bps/iops cross low limit, we can upgrade queue state. The other case is
      children has higher low limit than parent. Children's low limit is
      meaningless. As long as parent's bps/iops (which is a sum of childrens
      bps/iops) cross low limit, we can upgrade queue state.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c79892c5
    • S
      blk-throttle: configure bps/iops limit for cgroup in low limit · b22c417c
      Shaohua Li 提交于
      each queue will have a state machine. Initially queue is in LIMIT_LOW
      state, which means all cgroups will be throttled according to their low
      limit. After all cgroups with low limit cross the limit, the queue state
      gets upgraded to LIMIT_MAX state.
      For max limit, cgroup will use the limit configured by user.
      For low limit, cgroup will use the minimal value between low limit and
      max limit configured by user. If the minimal value is 0, which means the
      cgroup doesn't configure low limit, we will use max limit to throttle
      the cgroup and the cgroup is ready to upgrade to LIMIT_MAX
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b22c417c
    • S
      blk-throttle: add .low interface · cd5ab1b0
      Shaohua Li 提交于
      Add low limit for cgroup and corresponding cgroup interface. To be
      consistent with memcg, we allow users configure .low limit higher than
      .max limit. But the internal logic always assumes .low limit is lower
      than .max limit. So we add extra bps/iops_conf fields in throtl_grp for
      userspace configuration. Old bps/iops fields in throtl_grp will be the
      actual limit we use for throttling.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cd5ab1b0
    • S
      blk-throttle: add configure option for new .low interface · 327ffb9b
      Shaohua Li 提交于
      As discussed in LSF, add configure option for the interface and mark it
      as experimental, so people can try/test.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      327ffb9b
    • S
      blk-throttle: prepare support multiple limits · 9f626e37
      Shaohua Li 提交于
      We are going to support low/max limit, each cgroup will have 2 limits
      after that. This patch prepares for the multiple limits change.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9f626e37
    • S
      blk-throttle: use U64_MAX/UINT_MAX to replace -1 · 2ab5492d
      Shaohua Li 提交于
      clean up the code to avoid using -1
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2ab5492d
  2. 25 3月, 2017 3 次提交
  3. 23 3月, 2017 7 次提交
  4. 22 3月, 2017 6 次提交
    • J
      block: fix stacked driver stats init and free · a83b576c
      Jens Axboe 提交于
      If a driver allocates a queue for stacked usage, then it does
      not currently get stats allocated. This causes the later init
      of, eg, writeback throttling to blow up. Move the init to the
      queue allocation instead.
      
      Additionally, allow a NULL callback unregistration. This avoids
      having the caller check for that, fixing another oops on
      removal of a block device that doesn't have poll stats allocated.
      
      Fixes: 34dbad5d ("blk-stat: convert to callback-based statistics reporting")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a83b576c
    • O
      blk-stat: convert to callback-based statistics reporting · 34dbad5d
      Omar Sandoval 提交于
      Currently, statistics are gathered in ~0.13s windows, and users grab the
      statistics whenever they need them. This is not ideal for both in-tree
      users:
      
      1. Writeback throttling wants its own dynamically sized window of
         statistics. Since the blk-stats statistics are reset after every
         window and the wbt windows don't line up with the blk-stats windows,
         wbt doesn't see every I/O.
      2. Polling currently grabs the statistics on every I/O. Again, depending
         on how the window lines up, we may miss some I/Os. It's also
         unnecessary overhead to get the statistics on every I/O; the hybrid
         polling heuristic would be just as happy with the statistics from the
         previous full window.
      
      This reworks the blk-stats infrastructure to be callback-based: users
      register a callback that they want called at a given time with all of
      the statistics from the window during which the callback was active.
      Users can dynamically bucketize the statistics. wbt and polling both
      currently use read vs. write, but polling can be extended to further
      subdivide based on request size.
      
      The callbacks are kept on an RCU list, and each callback has percpu
      stats buffers. There will only be a few users, so the overhead on the
      I/O completion side is low. The stats flushing is also simplified
      considerably: since the timer function is responsible for clearing the
      statistics, we don't have to worry about stale statistics.
      
      wbt is a trivial conversion. After the conversion, the windowing problem
      mentioned above is fixed.
      
      For polling, we register an extra callback that caches the previous
      window's statistics in the struct request_queue for the hybrid polling
      heuristic to use.
      
      Since we no longer have a single stats buffer for the request queue,
      this also removes the sysfs and debugfs stats entries. To replace those,
      we add a debugfs entry for the poll statistics.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      34dbad5d
    • O
      blk-stat: move BLK_RQ_STAT_BATCH definition to blk-stat.c · 4875253f
      Omar Sandoval 提交于
      This is an implementation detail that no-one outside of blk-stat.c uses.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4875253f
    • O
      blk-stat: use READ and WRITE instead of BLK_STAT_{READ,WRITE} · fa2e39cb
      Omar Sandoval 提交于
      The stats buckets will become generic soon, so make the existing users
      use the common READ and WRITE definitions instead of one internal to
      blk-stat.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fa2e39cb
    • O
      block: remove extra calls to wbt_exit() · 0315b159
      Omar Sandoval 提交于
      We always call wbt_exit() from blk_release_queue(), so these are
      unnecessary.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0315b159
    • O
      blk-stat: fix blk_stat_sum() if all samples are batched · 7d8d0014
      Omar Sandoval 提交于
      We need to flush the batch _before_ we check the number of samples,
      otherwise we'll miss all of the batched samples.
      
      Fixes: cf43e6be ("block: add scalable completion tracking of requests")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7d8d0014
  5. 15 3月, 2017 1 次提交
  6. 13 3月, 2017 1 次提交
  7. 12 3月, 2017 1 次提交
    • N
      blk: Ensure users for current->bio_list can see the full list. · f5fe1b51
      NeilBrown 提交于
      Commit 79bd9959 ("blk: improve order of bio handling in generic_make_request()")
      changed current->bio_list so that it did not contain *all* of the
      queued bios, but only those submitted by the currently running
      make_request_fn.
      
      There are two places which walk the list and requeue selected bios,
      and others that check if the list is empty.  These are no longer
      correct.
      
      So redefine current->bio_list to point to an array of two lists, which
      contain all queued bios, and adjust various code to test or walk both
      lists.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Fixes: 79bd9959 ("blk: improve order of bio handling in generic_make_request()")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f5fe1b51
  8. 09 3月, 2017 3 次提交
    • N
      blk: improve order of bio handling in generic_make_request() · 79bd9959
      NeilBrown 提交于
      To avoid recursion on the kernel stack when stacked block devices
      are in use, generic_make_request() will, when called recursively,
      queue new requests for later handling.  They will be handled when the
      make_request_fn for the current bio completes.
      
      If any bios are submitted by a make_request_fn, these will ultimately
      be handled seqeuntially.  If the handling of one of those generates
      further requests, they will be added to the end of the queue.
      
      This strict first-in-first-out behaviour can lead to deadlocks in
      various ways, normally because a request might need to wait for a
      previous request to the same device to complete.  This can happen when
      they share a mempool, and can happen due to interdependencies
      particular to the device.  Both md and dm have examples where this happens.
      
      These deadlocks can be erradicated by more selective ordering of bios.
      Specifically by handling them in depth-first order.  That is: when the
      handling of one bio generates one or more further bios, they are
      handled immediately after the parent, before any siblings of the
      parent.  That way, when generic_make_request() calls make_request_fn
      for some particular device, we can be certain that all previously
      submited requests for that device have been completely handled and are
      not waiting for anything in the queue of requests maintained in
      generic_make_request().
      
      An easy way to achieve this would be to use a last-in-first-out stack
      instead of a queue.  However this will change the order of consecutive
      bios submitted by a make_request_fn, which could have unexpected consequences.
      Instead we take a slightly more complex approach.
      A fresh queue is created for each call to a make_request_fn.  After it completes,
      any bios for a different device are placed on the front of the main queue, followed
      by any bios for the same device, followed by all bios that were already on
      the queue before the make_request_fn was called.
      This provides the depth-first approach without reordering bios on the same level.
      
      This, by itself, it not enough to remove all deadlocks.  It just makes
      it possible for drivers to take the extra step required themselves.
      
      To avoid deadlocks, drivers must never risk waiting for a request
      after submitting one to generic_make_request.  This includes never
      allocing from a mempool twice in the one call to a make_request_fn.
      
      A common pattern in drivers is to call bio_split() in a loop, handling
      the first part and then looping around to possibly split the next part.
      Instead, a driver that finds it needs to split a bio should queue
      (with generic_make_request) the second part, handle the first part,
      and then return.  The new code in generic_make_request will ensure the
      requests to underlying bios are processed first, then the second bio
      that was split off.  If it splits again, the same process happens.  In
      each case one bio will be completely handled before the next one is attempted.
      
      With this is place, it should be possible to disable the
      punt_bios_to_recover() recovery thread for many block devices, and
      eventually it may be possible to remove it completely.
      
      Ref: http://www.spinics.net/lists/raid/msg54680.htmlTested-by: NJinpu Wang <jinpu.wang@profitbricks.com>
      Inspired-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      79bd9959
    • J
      Revert "scsi, block: fix duplicate bdi name registration crashes" · c01228db
      Jan Kara 提交于
      This reverts commit 0dba1314. It causes
      leaking of device numbers for SCSI when SCSI registers multiple gendisks
      for one request_queue in succession. It can be easily reproduced using
      Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
      Furthermore the protection provided by this commit is not needed anymore
      as the problem it was fixing got also fixed by commit 165a5e22
      "block: Move bdi_unregister() to del_gendisk()".
      
      [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c01228db
    • J
      block: Make del_gendisk() safer for disks without queues · 90f16fdd
      Jan Kara 提交于
      Commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()"
      added disk->queue dereference to del_gendisk(). Although del_gendisk()
      is not supposed to be called without disk->queue valid and
      blk_unregister_queue() warns in that case, this change will make it oops
      instead. Return to the old more robust behavior of just warning when
      del_gendisk() gets called for gendisk with disk->queue being NULL.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      90f16fdd