- 12 4月, 2011 2 次提交
-
-
由 Jens Axboe 提交于
It's done at the top to avoid doing it for every queue we unplug. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
It was removed with the on-stack plugging, readd it and track the depth of requests added when flushing the plug. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 11 4月, 2011 1 次提交
-
-
由 NeilBrown 提交于
If the request_fn ends up blocking, we could be re-entering the plug flush. Since the list is protected by explicitly not allowing schedule events, this isn't a terribly good idea. Additionally, it can cause us to recurse. As request_fn called by __blk_run_queue is allowed to 'schedule()' (after dropping the queue lock of course), it is possible to get a recursive call: schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list -> __blk_run_queue -> request_fn -> schedule We must make sure that the second schedule does not call into blk_flush_plug again. So instead of leaving the list of requests on blk_plug->list, move them to a separate list leaving blk_plug->list empty. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 06 4月, 2011 6 次提交
-
-
由 Konstantin Khlebnikov 提交于
Comparison function for list_sort() must be anticommutative, otherwise it is not sorting in ordinary meaning. But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0) it not distinguish negative and zero, so comparison function can implement only less-or-equal instead of full three-way comparison. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Mike Snitzer 提交于
The current block integrity (DIF/DIX) support in DM is verifying that all devices' integrity profiles match during DM device resume (which is past the point of no return). To some degree that is unavoidable (stacked DM devices force this late checking). But for most DM devices (which aren't stacking on other DM devices) the ideal time to verify all integrity profiles match is during table load. Introduce the notion of an "initialized" integrity profile: a profile that was blk_integrity_register()'d with a non-NULL 'blk_integrity' template. Add blk_integrity_is_initialized() to allow checking if a profile was initialized. Update DM integrity support to: - check all devices with _initialized_ integrity profiles match during table load; uninitialized profiles (e.g. for underlying DM device(s) of a stacked DM device) are ignored. - disallow a table load that would result in an integrity profile that conflicts with a DM device's existing (in-use) integrity profile - avoid clearing an existing integrity profile - validate all integrity profiles match during resume; but if they don't all we can do is report the mismatch (during resume we're past the point of no return) Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Andreas Schwab 提交于
xchg does not work portably with smaller than 32bit types. Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
It's not a preempt type request, in fact we have to insert it behind requests that do specify INSERT_FRONT. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Merge it with __elv_add_request(), it's pretty pointless to have a function with only two callers. The main interface is elv_add_request()/__elv_add_request(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Currently we just dump a non-informative 'request botched' message. Lets actually try and print something sane to help debug issues around this. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 26 3月, 2011 2 次提交
-
-
由 Jens Axboe 提交于
When the queue work handler was converted to delayed work, the stopping was inadvertently made sync as well. Change this back to being async stop, using __cancel_delayed_work() instead of cancel_delayed_work(). Reported-by: NJeremy Fitzhardinge <jeremy@goop.org> Reported-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
With the introduction of the on-stack plugging, we would assume that any request being inserted was a normal file system request. As flush/fua requires a special insert mode, this caused problems. Fix this up by checking for this in flush_plug_list() and use the appropriate insert mechanism. Big thanks goes to Markus Tripplesdorf for tirelessly testing patches, and to Sergey Senozhatsky for helping find the real issue. Reported-by: NMarkus Tripplesdorf <markus@trippelsdorf.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 23 3月, 2011 5 次提交
-
-
由 Li, Shaohua 提交于
Removing think time checking. A high thinktime queue might means the queue dispatches several requests and then do away. Limitting such queue seems meaningless. And also this can simplify code. This is suggested by Vivek. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
For v2, I added back lines to cfq_preempt_queue() that were removed during updates for accounting unaccounted_time. Thanks for pointing out that I'd missed these, Vivek. Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly cleared stats for preempting queues when it shouldn't have, because when we choose a queue to preempt, it still isn't necessarily scheduled next. Thanks to Vivek Goyal for figuring this out and understanding how the preemption code works. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Lina reported that if throttle limits are initially very high and then dropped, then no new bio might be dispatched for a long time. And the reason being that after dropping the limits we don't reset the existing slice and do the rate calculation with new low rate and account the bios dispatched at high rate. To fix it, reset the slice upon rate change. https://lkml.org/lkml/2011/3/10/298 Another problem with very high limit is that we never queued the bio on throtl service tree. That means we kept on extending the group slice but never trimmed it. Fix that also by regulary trimming the slice even if bio is not being queued up. Reported-by: NLina Lu <lulina_nuaa@foxmail.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
This change moves unaccounted_time to only be reported when CONFIG_DEBUG_BLK_CGROUP is true. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
Commit "Add unaccounted time to timeslice_used" changed the behavior of cfq_preempt_queue to set cfqq active. Vivek pointed out that other preemption rules might get involved, so we shouldn't manually set which queue is active. This cleans up the code to just clear the queue stats at preemption time. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 21 3月, 2011 1 次提交
-
-
由 Jens Axboe 提交于
One of the disadvantages of on-stack plugging is that we potentially lose out on merging since all pending IO isn't always visible to everybody. When we flush the on-stack plugs, right now we don't do any checks to see if potential merge candidates could be utilized. Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE. It works just ELEVATOR_INSERT_SORT, but first checks whether we can merge with an existing request before doing the insertion (if we fail merging). This fixes a regression with multiple processes issuing IO that can be merged. Thanks to Shaohua Li <shaohua.li@intel.com> for testing and fixing an accounting bug. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 17 3月, 2011 1 次提交
-
-
由 Justin TerAvest 提交于
Version 3 is updated to apply to for-2.6.39/core. For version 2, I took Vivek's advice and made sure we update the group weight from cfq_group_service_tree_add(). If a weight was updated while a group is on the service tree, the calculation for the total weight of the service tree can be adjusted improperly, which either leads to bad service tree weights, or potentially crashes (if total_weight becomes 0). This patch defers updates to the weight until a group is off the service tree. Signed-off-by: NJustin TerAvest <teravest@google.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 12 3月, 2011 2 次提交
-
-
由 Justin TerAvest 提交于
There are two kind of times that tasks are not charged for: the first seek and the extra time slice used over the allocated timeslice. Both of these exported as a new unaccounted_time stat. I think it would be good to have this reported in 'time' as well, but that is probably a separate discussion. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tao Ma 提交于
barrier is already removed, so remove the obsolete comments in blkdev_issue_zeroout. Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 11 3月, 2011 1 次提交
-
-
由 Lukas Czerner 提交于
BZ29402 https://bugzilla.kernel.org/show_bug.cgi?id=29402 We can hit serious mis-synchronization in bio completion path of blkdev_issue_zeroout() leading to a panic. The problem is that when we are going to wait_for_completion() in blkdev_issue_zeroout() we check if the bb.done equals issued (number of submitted bios). If it does, we can skip the wait_for_completition() and just out of the function since there is nothing to wait for. However, there is a ordering problem because bio_batch_end_io() is calling atomic_inc(&bb->done) before complete(), hence it might seem to blkdev_issue_zeroout() that all bios has been completed and exit. At this point when bio_batch_end_io() is going to call complete(bb->wait), bb and wait does not longer exist since it was allocated on stack in blkdev_issue_zeroout() ==> panic! (thread 1) (thread 2) bio_batch_end_io() blkdev_issue_zeroout() if(bb) { ... if (bb->end_io) ... bb->end_io(bio, err); ... atomic_inc(&bb->done); ... ... while (issued != atomic_read(&bb.done)) ... (let issued == bb.done) ... (do the rest of the function) ... return ret; complete(bb->wait); ^^^^^^^^ panic We can fix this easily by simplifying bio_batch and completion counting. Also remove bio_end_io_t *end_io since it is not used. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reported-by: NEric Whitney <eric.whitney@hp.com> Tested-by: NEric Whitney <eric.whitney@hp.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> CC: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 3月, 2011 6 次提交
-
-
由 Vivek Goyal 提交于
Use plug in throttle dispatch also as we are dispatching a bunch of bios in throttle context and some of them might merge. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
This patch adds support for creating a queuing context outside of the queue itself. This enables us to batch up pieces of IO before grabbing the block device queue lock and submitting them to the IO scheduler. The context is created on the stack of the process and assigned in the task structure, so that we can auto-unplug it if we hit a schedule event. The current queue plugging happens implicitly if IO is submitted to an empty device, yet callers have to remember to unplug that IO when they are going to wait for it. This is an ugly API and has caused bugs in the past. Additionally, it requires hacks in the vm (->sync_page() callback) to handle that logic. By switching to an explicit plugging scheme we make the API a lot nicer and can get rid of the ->sync_page() hack in the vm. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Currently we use plugging for that, but as plugging is going away, we need an alternative mechanism. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tejun Heo 提交于
Currently, disk_unblock_events() implicitly kick event check if the block count reaches zero. This behavior is not described in the comment and hinders with future changes. Make the unblocker explicitly check events by calling disk_check_events() as necessary. This patch doesn't cause any behavior difference. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
-
- 09 3月, 2011 1 次提交
-
-
由 Justin TerAvest 提交于
We've found that we still get good, useful isolation at weights this low. I'd like to adjust the minimum so that any other changes can take these values into account. Signed-off-by: NJustin TerAvest <teravest@google.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 08 3月, 2011 2 次提交
-
-
由 Vivek Goyal 提交于
When throttle group limits are updated through cgroups, a thread is woken up to process these updates. While reviewing that code, oleg noted couple of race conditions existed in the code and he also suggested that code can be simplified. This patch fixes the races simplifies the code based on Oleg's suggestions: - Use xchg(). - Introduced a common function throtl_update_blkio_group_common() which is shared now by all iops/bps update functions. Reviewed-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Fixed a merge issue, throtl_schedule_delayed_work() takes throtl_data as the argument now, not the queue. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
With the help of cgroup interface one can go and upate the bps/iops limits of existing group. Once the limits are udpated, a thread is woken up to see if some blocked group needs recalculation based on new limits and needs to be requeued. There was also a piece of code where I was checking for group limit update when a fresh bio comes in. This patch gets rid of that piece of code and keeps processing the limit change at one place throtl_process_limit_change(). It just keeps the code simple and easy to understand. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 07 3月, 2011 3 次提交
-
-
由 Gui Jianfeng 提交于
The update_vdisktime logic is broken since commit b54ce60e, st->min_vdisktime never makes a progress. Fix it. Thanks Vivek for pointing it out. Signed-off-by: NGui Jianfeng <guijianfen@cn.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Shaohua Li 提交于
If there are a sync and an async queue and the sync queue's think time is small, we can ignore the sync queue's dispatch quantum. Because the sync queue will always preempt the async queue, we don't need to care about async's latency. This can fix a performance regression of aiostress test, which is introduced by commit f8ae6e3e. The issue should exist even without the commit, but the commit amplifies the impact. The initial post does the same optimization for RT queue too, but since I have no real workload for it, Vivek suggests to drop it. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
We need to hold the queue lock over the reference increment, it's not atomic anymore. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 03 3月, 2011 3 次提交
-
-
由 Vivek Goyal 提交于
Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is written in such a way that it needs queue lock. In blk_release_queue() there is no gurantee that ->queue_lock is still around. Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported one problem. https://lkml.org/lkml/2010/10/23/86 And a quick fix moved blk_throtl_exit() to blk_release_queue(). commit 7ad58c02 Author: Jens Axboe <jaxboe@fusionio.com> Date: Sat Oct 23 20:40:26 2010 +0200 block: fix use-after-free bug in blk throttle code This patch reverts above change and does not try to shutdown the throtl work in blk_sync_queue(). By avoiding call to throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid the problem reported by Ingo. blk_sync_queue() seems to be used only by md driver and it seems to be using it to make sure q->unplug_fn is not called as md registers its own unplug functions and it is about to free up the data structures used by unplug_fn(). Block throttle does not call back into unplug_fn() or into md. So there is no need to cancel blk throttle work. In fact I think cancelling block throttle work is bad because it might happen that some bios are throttled and scheduled to be dispatched later with the help of pending work and if work is cancelled, these bios might never be dispatched. Block layer also uses blk_sync_queue() during blk_cleanup_queue() and blk_release_queue() time. That should be safe as we are also calling blk_throtl_exit() which should make sure all the throttling related data structures are cleaned up. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
There does not seem to be a clear convention whether q->queue_lock is initialized or not when blk_cleanup_queue() is called. In the past it was not necessary but now blk_throtl_exit() takes up queue lock by default and needs queue lock to be available. In fact elevator_exit() code also has similar requirement just that it is less stringent in the sense that elevator_exit() is called only if elevator is initialized. Two problems have been noticed because of ambiguity about spin lock status. - If a driver calls blk_alloc_queue() and then soon calls blk_cleanup_queue() almost immediately, (because some other driver structure allocation failed or some other error happened) then blk_throtl_exit() will run into issues as queue lock is not initialized. Loop driver ran into this issue recently and I noticed error paths in md driver too. Similar error paths should exist in other drivers too. - If some driver provided external spin lock and zapped the lock before blk_cleanup_queue(), then it can lead to issues. So this patch initializes the default queue lock at queue allocation time. block throttling code is one of the users of queue lock and it is initialized at the queue allocation time, so it makes sense to initialize ->queue_lock also to internal lock. A driver can overide that lock later. This will take care of the issue where a driver does not have to worry about initializing the queue lock to default before calling blk_cleanup_queue() Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Liu Yuan 提交于
Rename the numerals in the diskstats_show() into the macros. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NLiu Yuan <tailai.ly@taobao.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 02 3月, 2011 3 次提交
-
-
由 Tejun Heo 提交于
blk-flush decomposes a flush into sequence of multiple requests. On completion of a request, the next one is queued; however, block layer must not implicitly call into q->request_fn() directly from completion path. This makes the queue behave unexpectedly when seen from the drivers and violates the assumption that q->request_fn() is called with process context + queue_lock. This patch makes blk-flush the following two changes to make sure q->request_fn() is not called directly from request completion path. - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always use kblockd instead of calling directly into q->request_fn(). - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the request queue directly. Reported by Jan in the following threads. http://thread.gmane.org/gmane.linux.ide/48778 http://thread.gmane.org/gmane.linux.ide/48786 stable: applicable to v2.6.37. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NJan Beulich <JBeulich@novell.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tejun Heo 提交于
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
Effectively, make group_isolation=1 the default and remove the tunable. The setting group_isolation=0 was because by default we idle on sync-noidle tree and on fast devices, this can be very harmful for throughput. However, this problem can also be addressed by tuning slice_idle and possibly group_idle on faster storage devices. This change simplifies the CFQ code by removing the feature entirely. Signed-off-by: NJustin TerAvest <teravest@google.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-