- 21 12月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
If a bio is throttled and split after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio to be charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common after arbitrary bio size change. To fix this, we always set the BIO_THROTTLED flag if a bio is throttled. If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. However, cloned bio could be directed to a new disk, keeping the flag be a problem. The observation is we always set new disk for the bio in this case, so we can clear the flag in bio_set_dev(). This issue exists for a long time, arbitrary bio size change just makes it worse, so this should go into stable at least since v4.2. V1-> V2: Not add extra field in bio based on discussion with Tejun Cc: Vivek Goyal <vgoyal@redhat.com> Cc: stable@vger.kernel.org Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 12月, 2017 2 次提交
-
-
由 Jens Axboe 提交于
Commit caa4b024(blk-map: call blk_queue_bounce from blk_rq_append_bio) moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider the fact that the bounced bio becomes invisible to caller since the parameter type is 'struct bio *'. Make it a pointer to a pointer to a bio, so the caller sees the right bio also after a bounce. Fixes: caa4b024 ("blk-map: call blk_queue_bounce from blk_rq_append_bio") Cc: Christoph Hellwig <hch@lst.de> Reported-by: NMichele Ballabio <barra_cuda@katamail.com> (handling failure of blk_rq_append_bio(), only call bio_get() after blk_rq_append_bio() returns OK) Tested-by: NMichele Ballabio <barra_cuda@katamail.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Commit a8821f3f("block: Improvements to bounce-buffer handling") tries to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES, but ignores that passthrough I/O can use blk_queue_bounce() too. Especially, passthrough IO may not be sector-aligned, and the check of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may become true even though the max bvec number doesn't exceed BIO_MAX_PAGES, then cause the bio splitted, and the original passthrough bio is submited to generic_make_request(). This patch fixes this issue by checking if the bio is passthrough IO, and use bio_kmalloc() to allocate the cloned passthrough bio. Cc: NeilBrown <neilb@suse.com> Fixes: a8821f3f("block: Improvements to bounce-buffer handling") Tested-by: NMichele Ballabio <barra_cuda@katamail.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 12月, 2017 1 次提交
-
-
由 Omar Sandoval 提交于
Commit 8cf46660 ("kyber: fix hang on domain token wait queue") fixed a hang caused by leaving wait entries on the domain token wait queue after the __sbitmap_queue_get() retry succeeded, making that wait entry a "dud" which won't in turn wake more entries up. However, we can also get a dud entry if kyber_get_domain_token() fails once but is then called again and succeeds. This can happen if the hardware queue is rerun for some other reason, or, more likely, kyber_dispatch_request() tries the same domain twice. The fix is to remove our entry from the wait queue whenever we successfully get a token. The only complication is that we might be on one of many wait queues in the struct sbitmap_queue, but that's easily fixed by remembering which wait queue we were put on. While we're here, only initialize the wait queue entry once instead of on every wait, and use spin_lock_irq() instead of spin_lock_irqsave(), since this is always called from process context with irqs enabled. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 11月, 2017 4 次提交
-
-
由 weiping zhang 提交于
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 weiping zhang 提交于
wbt_done call wbt_clear_stat no matter current stat was tracked or not, move it to common place. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 weiping zhang 提交于
wbt_init doesn't set q->rq_wb to NULL, if wbt_init return 0, so check return value is enough, remove NULL checking. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 weiping zhang 提交于
rwb->wc and rwb->queue_depth were overwritten by wbt_set_write_cache and wbt_set_queue_depth, remove the default setting. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 11月, 2017 1 次提交
-
-
由 Mikulas Patocka 提交于
Remove useless assignment to the variable "split" because the variable is unconditionally assigned later. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 11月, 2017 2 次提交
-
-
由 Kees Cook 提交于
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: linux-block@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 20 11月, 2017 2 次提交
-
-
由 Randy Dunlap 提交于
Fix typo in error message. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 weiping zhang 提交于
device_add_disk need do more safety error handle, so this patch just add WARN_ON. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Adapted for current series by me. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 11月, 2017 1 次提交
-
-
由 Michael Lyle 提交于
A new field was introduced in 74d46992, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption. Signed-off-by: NMichael Lyle <mlyle@lyle.org> Fixes: 74d46992 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reported-by: NPavel Goran <via-bcache@pvgoran.name> Reported-by: NCampbell Steven <casteven@gmail.com> Reviewed-by: NColy Li <colyli@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 11月, 2017 2 次提交
-
-
由 Ming Lei 提交于
Once blk_set_queue_dying() is done in blk_cleanup_queue(), we call blk_freeze_queue() and wait for q->q_usage_counter becoming zero. But if there are tasks blocked in get_request(), q->q_usage_counter can never become zero. So we have to wake up all these tasks in blk_set_queue_dying() first. Fixes: 3ef28e83 ("block: generic request_queue reference counting") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Johannes Thumshirn 提交于
Now that we have a NUMA-aware version of kmalloc_array() we can use it instead of kmalloc_node() without an overflow check in the size calculation. Link: http://lkml.kernel.org/r/20170927082038.3782-3-jthumshirn@suse.deSigned-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 11月, 2017 3 次提交
-
-
由 Luca Miccio 提交于
BFQ currently creates, and updates, its own instance of the whole set of blkio statistics that cfq creates. Yet, from the comments of Tejun Heo in [1], it turned out that most of these statistics are meant/useful only for debugging. This commit makes BFQ create the latter, debugging statistics only if the option CONFIG_DEBUG_BLK_CGROUP is set. By doing so, this commit also enables BFQ to enjoy a high perfomance boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then BFQ has to update far fewer statistics, and, in particular, not the heaviest to update. To give an idea of the benefits, if CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on null_blk (configured with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS (+30%). We have measured similar or even much higher boosts with other CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have been obtained and can be reproduced very easily with the script in [1]. [1] https://www.spinics.net/lists/linux-block/msg18943.htmlSuggested-by: NTejun Heo <tj@kernel.org> Suggested-by: NUlf Hansson <ulf.hansson@linaro.org> Tested-by: NLee Tibbert <lee.tibbert@gmail.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NLuca Miccio <lucmiccio@gmail.com> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
bfq invokes various blkg_*stats_* functions to update the statistics contained in the special files blkio.bfq.* in the blkio controller groups, i.e., the I/O accounting related to the proportional-share policy provided by bfq. The execution of these functions takes a considerable percentage, about 40%, of the total per-request execution time of bfq (i.e., of the sum of the execution time of all the bfq functions that have to be executed to process an I/O request from its creation to its destruction). This reduces the request-processing rate sustainable by bfq noticeably, even on a multicore CPU. In fact, the bfq functions that invoke blkg_*stats_* functions cannot be executed in parallel with the rest of the code of bfq, because both are executed under the same same per-device scheduler lock. To reduce this slowdown, this commit moves, wherever possible, the invocation of these functions (more precisely, of the bfq functions that invoke blkg_*stats_* functions) outside the critical sections protected by the scheduler lock. With this change, and with all blkio.bfq.* statistics enabled, the throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel i7-4850HQ, in case of 8 threads doing random I/O in parallel on null_blk, with the latter configured with 0 latency. We obtained the same or higher throughput boosts, up to +30%, with other processors (some figures are reported in the documentation). For our tests, we used the script [1], with which our results can be easily reproduced. NOTE. This commit still protects the invocation of blkg_*stats_* functions with the request_queue lock, because the group these functions are invoked on may otherwise disappear before or while these functions are executed. Fortunately, tests without even this lock show, by difference, that the serialization caused by this lock has a little impact (at most ~5% of throughput reduction). [1] https://github.com/Algodev-github/IOSpeedTested-by: NLee Tibbert <lee.tibbert@gmail.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NLuca Miccio <lucmiccio@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Luca Miccio 提交于
bfqg_stats_update_io_add and bfqg_stats_update_io_remove are to be invoked, respectively, when an I/O request enters and when an I/O request exits the scheduler. Unfortunately, bfq does not fully comply with this scheme, because it does not invoke these functions for requests that are inserted into or extracted from its priority dispatch list. This commit fixes this mistake. Tested-by: NLee Tibbert <lee.tibbert@gmail.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NLuca Miccio <lucmiccio@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 11月, 2017 17 次提交
-
-
由 Jens Axboe 提交于
Various typos and/or spelling errors in comments. Fixes a few > 80 char lines as well. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we run out of driver tags, we currently treat shared and non-shared tags the same - both cases hook into the tag waitqueue. This is a bit more costly than it needs to be on unshared tags, since we have to both grab the hctx lock, and the waitqueue lock (and disable interrupts). For the non-shared case, we can simply mark the queue as needing a restart. Split blk_mq_dispatch_wait_add() to account for both cases, and rename it to blk_mq_mark_tag_wait() to better reflect what it does now. Without this patch, shared and non-shared performance is about the same with 4 fio thread hammering on a single null_blk device (~410K, at 75% sys). With the patch, the shared case is the same, but the non-shared tags case runs at 431K at 71% sys. Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Currently we are inconsistent in when we decide to run the queue. Using blk_mq_run_hw_queues() we check if the hctx has pending IO before running it, but we don't do that from the individual queue run function, blk_mq_run_hw_queue(). This results in a lot of extra and pointless queue runs, potentially, on flush requests and (much worse) on tag starvation situations. This is observable just looking at top output, with lots of kworkers active. For the !async runs, it just adds to the CPU overhead of blk-mq. Move the has-pending check into the run function instead of having callers do it. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Colin Ian King 提交于
It is possible that the pointer disk can be null and hence we can get a null pointer deference when accessing disk->flags. Add a null pointer check to avoid the dereference. Detected by CoverityScan, CID#1461133 ("Explicit null dereferenced") Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN") Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hannes Reinecke 提交于
When creating nvme multipath devices we should populate the 'slaves' and 'holders' directorys properly to aid userspace topology detection. Signed-off-by: NHannes Reinecke <hare@suse.com> [hch: split from a larger patch] Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Several block layer and NVMe core functions accept a combination of BLK_MQ_REQ_* flags through the 'flags' argument but there is no verification at compile time whether the right type of block layer flags is passed. Make it possible for sparse to verify this. This patch does not change any functionality. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: linux-nvme@lists.infradead.org Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
The contexts from which a SCSI device can be quiesced or resumed are: * Writing into /sys/class/scsi_device/*/device/state. * SCSI parallel (SPI) domain validation. * The SCSI device power management methods. See also scsi_bus_pm_ops. It is essential during suspend and resume that neither the filesystem state nor the filesystem metadata in RAM changes. This is why while the hibernation image is being written or restored that SCSI devices are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() and scsi_device_resume(). In the SDEV_QUIESCE state execution of non-preempt requests is deferred. This is realized by returning BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI devices. Avoid that a full queue prevents power management requests to be submitted by deferring allocation of non-preempt requests for devices in the quiesced state. This patch has been tested by running the following commands and by verifying that after each resume the fio job was still running: for ((i=0; i<10; i++)); do ( cd /sys/block/md0/md && while true; do [ "$(<sync_action)" = "idle" ] && echo check > sync_action sleep 1 done ) & pids=($!) for d in /sys/class/block/sd*[a-z]; do bdev=${d#/sys/class/block/} hcil=$(readlink "$d/device") hcil=${hcil#../../../} echo 4 > "$d/queue/nr_requests" echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \ --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \ --iodepth_batch=1 --thread --loops=$((2**31)) & pids+=($!) done sleep 1 echo "$(date) Hibernating ..." >>hibernate-test-log.txt systemctl hibernate sleep 10 kill "${pids[@]}" echo idle > /sys/block/md0/md/sync_action wait echo "$(date) Done." >>hibernate-test-log.txt done Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMartin Steigerwald <martin@lichtvoll.de> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
This flag will be used in the next patch to let the block layer core know whether or not a SCSI request queue has been quiesced. A quiesced SCSI queue namely only processes RQF_PREEMPT requests. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMartin Steigerwald <martin@lichtvoll.de> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to blk_get_request_flags(). Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMartin Steigerwald <martin@lichtvoll.de> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
A side effect of this patch is that the GFP mask that is passed to several allocation functions in the legacy block layer is changed from GFP_KERNEL into __GFP_DIRECT_RECLAIM. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMartin Steigerwald <martin@lichtvoll.de> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
This patch makes it possible to pause request allocation for the legacy block layer by calling blk_mq_freeze_queue() and blk_mq_unfreeze_queue(). Signed-off-by: NMing Lei <ming.lei@redhat.com> [ bvanassche: Combined two patches into one, edited a comment and made sure REQ_NOWAIT is handled properly in blk_old_get_request() ] Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMartin Steigerwald <martin@lichtvoll.de> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This patch attempts to make the case of hctx re-running on driver tag failure more robust. Without this patch, it's pretty easy to trigger a stall condition with shared tags. An example is using null_blk like this: modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 hw_queue_depth=1 which sets up 4 devices, sharing the same tag set with a depth of 1. Running a fio job ala: [global] bs=4k rw=randread norandommap direct=1 ioengine=libaio iodepth=4 [nullb0] filename=/dev/nullb0 [nullb1] filename=/dev/nullb1 [nullb2] filename=/dev/nullb2 [nullb3] filename=/dev/nullb3 will inevitably end with one or more threads being stuck waiting for a scheduler tag. That IO is then stuck forever, until someone else triggers a run of the queue. Ensure that we always re-run the hardware queue, if the driver tag we were waiting for got freed before we added our leftover request entries back on the dispatch list. Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Tested-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Avoid that removal of a request queue sporadically triggers the following warning: list_del corruption. next->prev should be ffff8807d649b970, but was 6b6b6b6b6b6b6b6b WARNING: CPU: 3 PID: 342 at lib/list_debug.c:56 __list_del_entry_valid+0x92/0xa0 Call Trace: process_one_work+0x11b/0x660 worker_thread+0x3d/0x3b0 kthread+0x129/0x140 ret_from_fork+0x27/0x40 Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We have to put the driver tag if dispatch budget can't be got, otherwise it might cause IO deadlock, especially in case that size of tags is very small. Fixes: de148297(blk-mq: introduce .get_budget and .put_budget in blk_mq_ops) Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Use the obvious calling convention. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This helper doesn't buy us much over calling kmap_atomic directly. In fact in the only caller it does a bit of useless work as the caller already has the bvec at hand, and said caller would even buggy for a multi-segment bio due to the use of this helper. So just remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This reverts commit 358a3a6b. We have cases that aren't covered 100% in the drivers, so for now we have to retain the shared tag restart loops. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 11月, 2017 4 次提交
-
-
由 Ming Lei 提交于
The idea behind it is simple: 1) for none scheduler, driver tag has to be borrowed for flush rq, otherwise we may run out of tag, and that causes an IO hang. And get/put driver tag is actually noop for none, so reordering tags isn't necessary at all. 2) for a real I/O scheduler, we need not allocate a driver tag upfront for flush rq. It works just fine to follow the same approach as normal requests: allocate driver tag for each rq just before calling ->queue_rq(). One driver visible change is that the driver tag isn't shared in the flush request sequence. That won't be a problem, since we always do that in legacy path. Then flush rq need not be treated specially wrt. get/put driver tag. This cleans up the code - for instance, reorder_tags_to_front() can be removed, and we needn't worry about request ordering in dispatch list for avoiding I/O deadlock. Also we have to put the driver tag before requeueing. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We need this helper to put the driver tag for flush rq, since we will not share tag in the flush request sequence in the following patch in case that I/O scheduler is applied. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
In case of IO scheduler we always pre-allocate one driver tag before calling blk_insert_flush(), and flush request will be marked as RQF_FLUSH_SEQ once it is in flush machinery. So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle the request, otherwise the flush request is dispatched to ->dispatch list directly. This is a preparation patch for not preallocating a driver tag for flush requests, and for not treating flush requests as a special case. This is similar to what the legacy path does. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
In the following patch, we will use RQF_FLUSH_SEQ to decide: 1) if the flag isn't set, the flush rq need to be inserted via blk_insert_flush() 2) otherwise, the flush rq need to be dispatched directly since it is in flush machinery now. So we use blk_mq_request_bypass_insert() for requests of bypassing flush machinery, just like the legacy path did. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-