1. 21 12月, 2017 1 次提交
    • S
      block-throttle: avoid double charge · 111be883
      Shaohua Li 提交于
      If a bio is throttled and split after throttling, the bio could be
      resubmited and enters the throttling again. This will cause part of the
      bio to be charged multiple times. If the cgroup has an IO limit, the
      double charge will significantly harm the performance. The bio split
      becomes quite common after arbitrary bio size change.
      
      To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
      If the bio is cloned/split, we copy the flag to new bio too to avoid a
      double charge. However, cloned bio could be directed to a new disk,
      keeping the flag be a problem. The observation is we always set new disk
      for the bio in this case, so we can clear the flag in bio_set_dev().
      
      This issue exists for a long time, arbitrary bio size change just makes
      it worse, so this should go into stable at least since v4.2.
      
      V1-> V2: Not add extra field in bio based on discussion with Tejun
      
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      111be883
  2. 19 12月, 2017 2 次提交
    • J
      block: fix blk_rq_append_bio · 0abc2a10
      Jens Axboe 提交于
      Commit caa4b024(blk-map: call blk_queue_bounce from blk_rq_append_bio)
      moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
      the fact that the bounced bio becomes invisible to caller since the
      parameter type is 'struct bio *'. Make it a pointer to a pointer to
      a bio, so the caller sees the right bio also after a bounce.
      
      Fixes: caa4b024 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
      Cc: Christoph Hellwig <hch@lst.de>
      Reported-by: NMichele Ballabio <barra_cuda@katamail.com>
      (handling failure of blk_rq_append_bio(), only call bio_get() after
      blk_rq_append_bio() returns OK)
      Tested-by: NMichele Ballabio <barra_cuda@katamail.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0abc2a10
    • M
      block: don't let passthrough IO go into .make_request_fn() · 14cb0dc6
      Ming Lei 提交于
      Commit a8821f3f("block: Improvements to bounce-buffer handling") tries
      to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
      but ignores that passthrough I/O can use blk_queue_bounce() too.
      Especially, passthrough IO may not be sector-aligned, and the check
      of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
      become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
      then cause the bio splitted, and the original passthrough bio is submited
      to generic_make_request().
      
      This patch fixes this issue by checking if the bio is passthrough IO,
      and use bio_kmalloc() to allocate the cloned passthrough bio.
      
      Cc: NeilBrown <neilb@suse.com>
      Fixes: a8821f3f("block: Improvements to bounce-buffer handling")
      Tested-by: NMichele Ballabio <barra_cuda@katamail.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      14cb0dc6
  3. 07 12月, 2017 1 次提交
    • O
      kyber: fix another domain token wait queue hang · fcf38cdf
      Omar Sandoval 提交于
      Commit 8cf46660 ("kyber: fix hang on domain token wait queue") fixed
      a hang caused by leaving wait entries on the domain token wait queue
      after the __sbitmap_queue_get() retry succeeded, making that wait entry
      a "dud" which won't in turn wake more entries up. However, we can also
      get a dud entry if kyber_get_domain_token() fails once but is then
      called again and succeeds. This can happen if the hardware queue is
      rerun for some other reason, or, more likely, kyber_dispatch_request()
      tries the same domain twice.
      
      The fix is to remove our entry from the wait queue whenever we
      successfully get a token. The only complication is that we might be on
      one of many wait queues in the struct sbitmap_queue, but that's easily
      fixed by remembering which wait queue we were put on.
      
      While we're here, only initialize the wait queue entry once instead of
      on every wait, and use spin_lock_irq() instead of spin_lock_irqsave(),
      since this is always called from process context with irqs enabled.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fcf38cdf
  4. 24 11月, 2017 4 次提交
  5. 23 11月, 2017 1 次提交
  6. 22 11月, 2017 2 次提交
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
    • K
      block/laptop_mode: Convert timers to use timer_setup() · bca237a5
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      bca237a5
  7. 20 11月, 2017 2 次提交
  8. 17 11月, 2017 1 次提交
  9. 16 11月, 2017 2 次提交
  10. 15 11月, 2017 3 次提交
    • L
      block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP · a33801e8
      Luca Miccio 提交于
      BFQ currently creates, and updates, its own instance of the whole
      set of blkio statistics that cfq creates. Yet, from the comments
      of Tejun Heo in [1], it turned out that most of these statistics
      are meant/useful only for debugging. This commit makes BFQ create
      the latter, debugging statistics only if the option
      CONFIG_DEBUG_BLK_CGROUP is set.
      
      By doing so, this commit also enables BFQ to enjoy a high perfomance
      boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then
      BFQ has to update far fewer statistics, and, in particular, not the
      heaviest to update.  To give an idea of the benefits, if
      CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and
      with 8 threads doing random I/O in parallel on null_blk (configured
      with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS
      (+30%). We have measured similar or even much higher boosts with other
      CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have
      been obtained and can be reproduced very easily with the script in [1].
      
      [1] https://www.spinics.net/lists/linux-block/msg18943.htmlSuggested-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NLuca Miccio <lucmiccio@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a33801e8
    • P
      block, bfq: update blkio stats outside the scheduler lock · 24bfd19b
      Paolo Valente 提交于
      bfq invokes various blkg_*stats_* functions to update the statistics
      contained in the special files blkio.bfq.* in the blkio controller
      groups, i.e., the I/O accounting related to the proportional-share
      policy provided by bfq. The execution of these functions takes a
      considerable percentage, about 40%, of the total per-request execution
      time of bfq (i.e., of the sum of the execution time of all the bfq
      functions that have to be executed to process an I/O request from its
      creation to its destruction).  This reduces the request-processing
      rate sustainable by bfq noticeably, even on a multicore CPU. In fact,
      the bfq functions that invoke blkg_*stats_* functions cannot be
      executed in parallel with the rest of the code of bfq, because both
      are executed under the same same per-device scheduler lock.
      
      To reduce this slowdown, this commit moves, wherever possible, the
      invocation of these functions (more precisely, of the bfq functions
      that invoke blkg_*stats_* functions) outside the critical sections
      protected by the scheduler lock.
      
      With this change, and with all blkio.bfq.* statistics enabled, the
      throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel
      i7-4850HQ, in case of 8 threads doing random I/O in parallel on
      null_blk, with the latter configured with 0 latency. We obtained the
      same or higher throughput boosts, up to +30%, with other processors
      (some figures are reported in the documentation). For our tests, we
      used the script [1], with which our results can be easily reproduced.
      
      NOTE. This commit still protects the invocation of blkg_*stats_*
      functions with the request_queue lock, because the group these
      functions are invoked on may otherwise disappear before or while these
      functions are executed.  Fortunately, tests without even this lock
      show, by difference, that the serialization caused by this lock has a
      little impact (at most ~5% of throughput reduction).
      
      [1] https://github.com/Algodev-github/IOSpeedTested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NLuca Miccio <lucmiccio@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24bfd19b
    • L
      block, bfq: add missing invocations of bfqg_stats_update_io_add/remove · 614822f8
      Luca Miccio 提交于
      bfqg_stats_update_io_add and bfqg_stats_update_io_remove are to be
      invoked, respectively, when an I/O request enters and when an I/O
      request exits the scheduler. Unfortunately, bfq does not fully comply
      with this scheme, because it does not invoke these functions for
      requests that are inserted into or extracted from its priority
      dispatch list. This commit fixes this mistake.
      Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NLuca Miccio <lucmiccio@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      614822f8
  11. 11 11月, 2017 17 次提交
  12. 05 11月, 2017 4 次提交
    • M
      blk-mq: don't allocate driver tag upfront for flush rq · 923218f6
      Ming Lei 提交于
      The idea behind it is simple:
      
      1) for none scheduler, driver tag has to be borrowed for flush rq,
         otherwise we may run out of tag, and that causes an IO hang. And
         get/put driver tag is actually noop for none, so reordering tags
         isn't necessary at all.
      
      2) for a real I/O scheduler, we need not allocate a driver tag upfront
         for flush rq. It works just fine to follow the same approach as
         normal requests: allocate driver tag for each rq just before calling
         ->queue_rq().
      
      One driver visible change is that the driver tag isn't shared in the
      flush request sequence. That won't be a problem, since we always do that
      in legacy path.
      
      Then flush rq need not be treated specially wrt. get/put driver tag.
      This cleans up the code - for instance, reorder_tags_to_front() can be
      removed, and we needn't worry about request ordering in dispatch list
      for avoiding I/O deadlock.
      
      Also we have to put the driver tag before requeueing.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      923218f6
    • M
      blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h · 244c65a3
      Ming Lei 提交于
      We need this helper to put the driver tag for flush rq, since we will
      not share tag in the flush request sequence in the following patch
      in case that I/O scheduler is applied.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      244c65a3
    • M
      blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ · a6a252e6
      Ming Lei 提交于
      In case of IO scheduler we always pre-allocate one driver tag before
      calling blk_insert_flush(), and flush request will be marked as
      RQF_FLUSH_SEQ once it is in flush machinery.
      
      So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle
      the request, otherwise the flush request is dispatched to ->dispatch
      list directly.
      
      This is a preparation patch for not preallocating a driver tag for flush
      requests, and for not treating flush requests as a special case. This is
      similar to what the legacy path does.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a6a252e6
    • M
      blk-flush: use blk_mq_request_bypass_insert() · 598906f8
      Ming Lei 提交于
      In the following patch, we will use RQF_FLUSH_SEQ to decide:
      
      1) if the flag isn't set, the flush rq need to be inserted via
      blk_insert_flush()
      
      2) otherwise, the flush rq need to be dispatched directly since
      it is in flush machinery now.
      
      So we use blk_mq_request_bypass_insert() for requests of bypassing
      flush machinery, just like the legacy path did.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      598906f8