- 07 3月, 2012 11 次提交
-
-
由 Tejun Heo 提交于
Currently block core calls directly into blk-throttle for init, drain and exit. This patch adds blkcg_{init|drain|exit}_queue() which wraps the blk-throttle functions. This is to give more control and visiblity to blkcg core layer for proper layering. Further patches will add logic common to blkcg policies to the functions. While at it, collapse blk_throtl_release() into blk_throtl_exit(). There's no reason to keep them separate. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
Currently, blkg points to the associated blkcg via its css_id. This unnecessarily complicates dereferencing blkcg. Let blkg hold a reference to the associated blkcg and point directly to it and disable css_id on blkio_subsys. This change requires splitting blkiocg_destroy() into blkiocg_pre_destroy() and blkiocg_destroy() so that all blkg's can be destroyed and all the blkcg references held by them dropped during cgroup removal. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
blkg->dev is dev_t recording the device number of the block device for the associated request_queue. It is used to identify the associated block device when printing out configuration or stats. This is redundant to begin with. A blkg is an association between a cgroup and a request_queue and it of course is possible to reach request_queue from blkg and synchronization conventions are in place for safe q dereferencing, so this shouldn't be necessary from the beginning. Furthermore, it's initialized by sscanf()ing the device name of backing_dev_info. The mind boggles. Anyways, if blkg is visible under rcu lock, we *know* that the associated request_queue hasn't gone away yet and its bdi is registered and alive - blkg can't be created for request_queue which hasn't been fully initialized and it can't go away before blkg is removed. Let stat and conf read functions get device name from blkg->q->backing_dev_info.dev and pass it down to printing functions and remove blkg->dev. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
blkcg is very peculiar in that it allows setting and remembering configurations for non-existent devices by maintaining separate data structures for configuration. This behavior is completely out of the usual norms and outright confusing; furthermore, it uses dev_t number to match the configuration to devices, which is unpredictable to begin with and becomes completely unuseable if EXT_DEVT is fully used. It is wholely unnecessary - we already have fully functional userland mechanism to program devices being hotplugged which has full access to device identification, connection topology and filesystem information. Add a new struct blkio_group_conf which contains all blkcg configurations to blkio_group and let blkio_group, which can be created iff the associated device exists and is removed when the associated device goes away, carry all configurations. Note that, after this patch, all newly created blkg's will always have the default configuration (unlimited for throttling and blkcg's weight for propio). This patch makes blkio_policy_node meaningless but doesn't remove it. The next patch will. -v2: Updated to retry after short sleep if blkg lookup/creation failed due to the queue being temporarily bypassed as indicated by -EBUSY return. Pointed out by Vivek. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
Currently both blk-throttle and cfq-iosched implement their own blkio_group creation code in throtl_get_tg() and cfq_get_cfqg(). This patch factors out the common code into blkg_lookup_create(), which returns ERR_PTR value so that transitional failures due to queue bypass can be distinguished from other failures. * New plkio_policy_ops methods blkio_alloc_group_fn() and blkio_link_group_fn added. Both are transitional and will be removed once the blkg management code is fully moved into blk-cgroup.c. * blkio_alloc_group_fn() allocates policy-specific blkg which is usually a larger data structure with blkg as the first entry and intiailizes it. Note that initialization of blkg proper, including percpu stats, is responsibility of blk-cgroup proper. Note that default config (weight, bps...) initialization is done from this method; otherwise, we end up violating locking order between blkcg and q locks via blkcg_get_CONF() functions. * blkio_link_group_fn() is called under queue_lock and responsible for linking the blkg to the queue. blkcg side is handled by blk-cgroup proper. * The common blkg creation function is named blkg_lookup_create() and blkiocg_lookup_group() is renamed to blkg_lookup() for consistency. Also, throtl / cfq related functions are similarly [re]named for consistency. This simplifies blkcg policy implementations and enables further cleanup. -v2: Vivek noticed that blkg_lookup_create() incorrectly tested blk_queue_dead() instead of blk_queue_bypass() leading a user of the function ending up creating a new blkg on bypassing queue. This is a bug introduced while relocating bypass patches before this one. Fixed. -v3: ERR_PTR patch folded into this one. @for_root added to blkg_lookup_create() to allow creating root group on a bypassed queue during elevator switch. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
For root blkg, blk_throtl_init() was using throtl_alloc_tg() explicitly and cfq_init_queue() was manually initializing embedded cfqd->root_group, adding unnecessarily different code paths to blkg handling. Make both use the usual blkio_group get functions - throtl_get_tg() and cfq_get_cfqg() - for the root blkio_group too. Note that blk_throtl_init() callsite is pushed downwards in blk_alloc_queue_node() so that @q is sufficiently initialized for throtl_get_tg(). This simplifies root blkg handling noticeably for cfq and will allow further modularization of blkcg API. -v2: Vivek pointed out that using cfq_get_cfqg() won't work if CONFIG_CFQ_GROUP_IOSCHED is disabled. Fix it by factoring out initialization of base part of cfqg into cfq_init_cfqg_base() and alloc/init/free explicitly if !CONFIG_CFQ_GROUP_IOSCHED. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
blkgio_group is association between a block cgroup and a queue for a given policy. Using opaque void * for association makes things confusing and hinders factoring of common code. Use request_queue * and, if necessary, policy id instead. This will help block cgroup API cleanup. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
In both blkg get functions - throtl_get_tg() and cfq_get_cfqg(), instead of obtaining blkcg of %current explicitly, let the caller specify the blkcg to use as parameter and make both functions hold on to the blkcg. This is part of block cgroup interface cleanup and will help making blkcg API more modular. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
rcu_read_lock() in throtl_get_tb() and cfq_get_cfqg() holds onto @blkcg while looking up blkg. For API cleanup, the next patch will make the caller responsible for determining @blkcg to look blkg from and let them specify it as a parameter. Move rcu read locking out to the callers to prepare for the change. -v2: Originally this patch was described as a fix for RCU read locking bug around @blkg, which Vivek pointed out to be incorrect. It was from misunderstanding the role of rcu locking as protecting @blkg not @blkcg. Patch description updated. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
Elevator switch may involve changes to blkcg policies. Implement shoot down of blkio_groups. Combined with the previous bypass updates, the end goal is updating blkcg core such that it can ensure that blkcg's being affected become quiescent and don't have any per-blkg data hanging around before commencing any policy updates. Until queues are made aware of the policies that applies to them, as an interim step, all per-policy blkg data will be shot down. * blk-throtl doesn't need this change as it can't be disabled for a live queue; however, update it anyway as the scheduled blkg unification requires this behavior change. This means that blk-throtl configuration will be unnecessarily lost over elevator switch. This oddity will be removed after blkcg learns to associate individual policies with request_queues. * blk-throtl dosen't shoot down root_tg. This is to ease transition. Unified blkg will always have persistent root group and not shooting down root_tg for now eases transition to that point by avoiding having to update td->root_tg and is safe as blk-throtl can never be disabled -v2: Vivek pointed out that group list is not guaranteed to be empty on return from clear function if it raced cgroup removal and lost. Fix it by waiting a bit and retrying. This kludge will soon be removed once locking is updated such that blkg is never in limbo state between blkcg and request_queue locks. blk-throtl no longer shoots down root_tg to avoid breaking td->root_tg. Also, Nest queue_lock inside blkio_list_lock not the other way around to avoid introduce possible deadlock via blkcg lock. -v3: blkcg_clear_queue() repositioned and renamed to blkg_destroy_all() to increase consistency with later changes. cfq_clear_queue() updated to check q->elevator before dereferencing it to avoid NULL dereference on not fully initialized queues (used by later change). Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
Extend queue bypassing such that dying queue is always bypassing and blk-throttle is drained on bypass. With blkcg policies updated to test blk_queue_bypass() instead of blk_queue_dead(), this ensures that no bio or request is held by or going through blkcg policies on a bypassing queue. This will be used to implement blkg cleanup on elevator switches and policy changes. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 12月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
There are a number of QUEUE_FLAG_DEAD tests. Add blk_queue_dead() macro and use it. This patch doesn't introduce any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 10月, 2011 1 次提交
-
-
由 Jens Axboe 提交于
We can't use the latter if !CONFIG_LOCKDEP. Reported-by: NSedat Dilek <sedat.dilek@googlemail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 10月, 2011 4 次提交
-
-
由 Tejun Heo 提交于
request_queue is refcounted but actually depdends on lifetime management from the queue owner - on blk_cleanup_queue(), block layer expects that there's no request passing through request_queue and no new one will. This is fundamentally broken. The queue owner (e.g. SCSI layer) doesn't have a way to know whether there are other active users before calling blk_cleanup_queue() and other users (e.g. bsg) don't have any guarantee that the queue is and would stay valid while it's holding a reference. With delay added in blk_queue_bio() before queue_lock is grabbed, the following oops can be easily triggered when a device is removed with in-flight IOs. sd 0:0:1:0: [sdb] Stopping disk ata1.01: disabled general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs RIP: 0010:[<ffffffff8137d651>] [<ffffffff8137d651>] elv_rqhash_find+0x61/0x100 ... Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80) ... Call Trace: [<ffffffff8137d774>] elv_merge+0x84/0xe0 [<ffffffff81385b54>] blk_queue_bio+0xf4/0x400 [<ffffffff813838ea>] generic_make_request+0xca/0x100 [<ffffffff81383994>] submit_bio+0x74/0x100 [<ffffffff811c53ec>] dio_bio_submit+0xbc/0xc0 [<ffffffff811c610e>] __blockdev_direct_IO+0x92e/0xb40 [<ffffffff811c39f7>] blkdev_direct_IO+0x57/0x60 [<ffffffff8113b1c5>] generic_file_aio_read+0x6d5/0x760 [<ffffffff8118c1ca>] do_sync_read+0xda/0x120 [<ffffffff8118ce55>] vfs_read+0xc5/0x180 [<ffffffff8118cfaa>] sys_pread64+0x9a/0xb0 [<ffffffff81afaf6b>] system_call_fastpath+0x16/0x1b This happens because blk_queue_cleanup() destroys the queue and elevator whether IOs are in progress or not and DEAD tests are sprinkled in the request processing path without proper synchronization. Similar problem exists for blk-throtl. On queue cleanup, blk-throtl is shutdown whether it has requests in it or not. Depending on timing, it either oopses or throttled bios are lost putting tasks which are waiting for bio completion into eternal D state. The way it should work is having the usual clear distinction between shutdown and release. Shutdown drains all currently pending requests, marks the queue dead, and performs partial teardown of the now unnecessary part of the queue. Even after shutdown is complete, reference holders are still allowed to issue requests to the queue although they will be immmediately failed. The rest of teardown happens on release. This patch makes the following changes to make blk_queue_cleanup() behave as proper shutdown. * QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and queue_lock. * Unsynchronized DEAD check in generic_make_request_checks() removed. This couldn't make any meaningful difference as the queue could die after the check. * blk_drain_queue() updated such that it can drain all requests and is now called during cleanup. * blk_throtl updated such that it checks DEAD on grabbing queue_lock, drains all throttled bios during cleanup and free td when queue is released. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
blk_throtl_bio() and throtl_get_tg() have rather unusual interface. * throtl_get_tg() returns pointer to a valid tg or ERR_PTR(-ENODEV), and drops queue_lock in the latter case. Different locking context depending on return value is error-prone and DEAD state is scheduled to be protected by queue_lock anyway. Move DEAD check inside queue_lock and return valid tg or NULL. * blk_throtl_bio() indicates return status both with its return value and in/out param **@bio. The former is used to indicate whether queue is found to be dead during throtl processing. The latter whether the bio is throttled. There's no point in returning DEAD check result from blk_throtl_bio(). The queue can die after blk_throtl_bio() is finished but before make_request_fn() grabs queue lock. Make it take *@bio instead and return boolean result indicating whether the request is throttled or not. This patch doesn't cause any visible functional difference. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
blk_get/put_queue() in scsi_cmd_ioctl() and throtl_get_tg() are completely bogus. The caller must have a reference to the queue on entry and taking an extra reference doesn't change anything. For scsi_cmd_ioctl(), the only effect is that it ends up checking QUEUE_FLAG_DEAD on entry; however, this is bogus as queue can die right after blk_get_queue(). Dead queue should be and is handled in request issue path (it's somewhat broken now but that's a separate problem and doesn't affect this one much). throtl_get_tg() incorrectly assumes that q is rcu freed. Also, it doesn't check return value of blk_get_queue(). If the queue is already dead, it ends up doing an extra put. Drop them. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
blk_throtl interface is block internal and there's no reason to have them in linux/blkdev.h. Move them to block/blk.h. This patch doesn't introduce any functional change. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 8月, 2011 1 次提交
-
-
由 Shaohua Li 提交于
read request is always sync. Using rw_is_sync() to determine if a bio is sync. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 14 6月, 2011 2 次提交
-
-
由 Joe Perches 提交于
The total of two unsigned values should also be unsigned. Update throtl_log output to unsigned. Update total_nr_queued test to non-zero to be the same as the other total_nr_queued tests. Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Joe Perches 提交于
Use the compiler to verify format strings and arguments. Fix fallout. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 13 6月, 2011 1 次提交
-
-
由 Joe Perches 提交于
Use the compiler to verify format strings and arguments. Fix fallout. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 21 5月, 2011 8 次提交
-
-
由 Vivek Goyal 提交于
Currently we take a queue lock on each bio to check if there are any throttling rules associated with the group and also update the stats. Now access the group under rcu and update the stats without taking the queue lock. Queue lock is taken only if there are throttling rules associated with the group. So the common case of root group when there are no rules, save unnecessary pounding of request queue lock. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Currently we take blkg_stat lock for even updating the stats. So even if a group has no throttling rules (common case for root group), we end up taking blkg_lock, for updating the stats. Make dispatch stats per cpu so that these can be updated without taking blkg lock. If cpu goes offline, these stats simply disappear. No protection has been provided for that yet. Do we really need anything for that? Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Soon we will allow accessing a throtl_grp under rcu_read_lock(). Hence start freeing up throtl_grp after one rcu grace period. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Use same helper function for root group as we use with dynamically allocated groups to add it to various lists. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
A helper function for the code which is used at 2-3 places. Makes reading code little easier. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Currently, we allocate root throtl_grp statically. But as we will be introducing per cpu stat pointers and that will be allocated dynamically even for root group, we might as well make whole root throtl_grp allocation dynamic and treat it in same manner as other groups. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Currently, all the cfq_group or throtl_group allocations happen while we are holding ->queue_lock and sleeping is not allowed. Soon, we will move to per cpu stats and also need to allocate the per group stats. As one can not call alloc_percpu() from atomic context as it can sleep, we need to drop ->queue_lock, allocate the group, retake the lock and continue processing. In throttling code, I check the queue DEAD flag again to make sure that driver did not call blk_cleanup_queue() in the mean time. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Group initialization code seems to be at two places. root group initialization in blk_throtl_init() and dynamically allocated group in throtl_find_alloc_tg(). Create a common function and use at both the places. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 16 5月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
Currentlly we first map the task to cgroup and then cgroup to blkio_cgroup. There is a more direct way to get to blkio_cgroup from task using task_subsys_state(). Use that. The real reason for the fix is that it also avoids a race in generic cgroup code. During remount/umount rebind_subsystems() is called and it can do following with and rcu protection. cgrp->subsys[i] = NULL; That means if somebody got hold of cgroup under rcu and then it tried to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which is wrong. I was running into this race condition with ltp running on a upstream derived kernel and that lead to crash. So ideally we should also fix cgroup generic code to wait for rcu grace period before setting pointer to NULL. Li Zefan is not very keen on introducing synchronize_wait() as he thinks it will slow down moun/remount/umount operations. So for the time being atleast fix the kernel crash by taking a more direct route to blkio_cgroup. One tester had reported a crash while running LTP on a derived kernel and with this fix crash is no more seen while the test has been running for over 6 days. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 06 4月, 2011 1 次提交
-
-
由 Andreas Schwab 提交于
xchg does not work portably with smaller than 32bit types. Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 23 3月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
Lina reported that if throttle limits are initially very high and then dropped, then no new bio might be dispatched for a long time. And the reason being that after dropping the limits we don't reset the existing slice and do the rate calculation with new low rate and account the bios dispatched at high rate. To fix it, reset the slice upon rate change. https://lkml.org/lkml/2011/3/10/298 Another problem with very high limit is that we never queued the bio on throtl service tree. That means we kept on extending the group slice but never trimmed it. Fix that also by regulary trimming the slice even if bio is not being queued up. Reported-by: NLina Lu <lulina_nuaa@foxmail.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 3月, 2011 2 次提交
-
-
由 Vivek Goyal 提交于
Use plug in throttle dispatch also as we are dispatching a bunch of bios in throttle context and some of them might merge. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 08 3月, 2011 2 次提交
-
-
由 Vivek Goyal 提交于
When throttle group limits are updated through cgroups, a thread is woken up to process these updates. While reviewing that code, oleg noted couple of race conditions existed in the code and he also suggested that code can be simplified. This patch fixes the races simplifies the code based on Oleg's suggestions: - Use xchg(). - Introduced a common function throtl_update_blkio_group_common() which is shared now by all iops/bps update functions. Reviewed-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Fixed a merge issue, throtl_schedule_delayed_work() takes throtl_data as the argument now, not the queue. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
With the help of cgroup interface one can go and upate the bps/iops limits of existing group. Once the limits are udpated, a thread is woken up to see if some blocked group needs recalculation based on new limits and needs to be requeued. There was also a piece of code where I was checking for group limit update when a fresh bio comes in. This patch gets rid of that piece of code and keeps processing the limit change at one place throtl_process_limit_change(). It just keeps the code simple and easy to understand. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 03 3月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is written in such a way that it needs queue lock. In blk_release_queue() there is no gurantee that ->queue_lock is still around. Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported one problem. https://lkml.org/lkml/2010/10/23/86 And a quick fix moved blk_throtl_exit() to blk_release_queue(). commit 7ad58c02 Author: Jens Axboe <jaxboe@fusionio.com> Date: Sat Oct 23 20:40:26 2010 +0200 block: fix use-after-free bug in blk throttle code This patch reverts above change and does not try to shutdown the throtl work in blk_sync_queue(). By avoiding call to throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid the problem reported by Ingo. blk_sync_queue() seems to be used only by md driver and it seems to be using it to make sure q->unplug_fn is not called as md registers its own unplug functions and it is about to free up the data structures used by unplug_fn(). Block throttle does not call back into unplug_fn() or into md. So there is no need to cancel blk throttle work. In fact I think cancelling block throttle work is bad because it might happen that some bios are throttled and scheduled to be dispatched later with the help of pending work and if work is cancelled, these bios might never be dispatched. Block layer also uses blk_sync_queue() during blk_cleanup_queue() and blk_release_queue() time. That should be safe as we are also calling blk_throtl_exit() which should make sure all the throttling related data structures are cleaned up. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 02 3月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
o Dominik Klein reported a system hang issue while doing some blkio throttling testing. https://lkml.org/lkml/2011/2/24/173 o Some tracing revealed that CFQ was not dispatching any more jobs as queue unplug was not happening. And queue unplug was not happening because unplug work was not being called as there was one throttling work on same cpu which as not finished yet. And throttling work had not finished as it was tyring to dispatch a bio to CFQ but all the request descriptors were consume to it was put to sleep. o So basically it is a cyclic dependecny between CFQ unplug work and throtl dispatch work. Tejun suggested that use separate workqueue for such cases. o This patch uses a separate workqueue for throttle related work and does not rely on kblockd workqueue anymore. Cc: stable@kernel.org Reported-by: NDominik Klein <dk@in-telegence.net> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 19 1月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
o Jeff Moyer was doing some testing on a RAM backed disk and blkiocg_lookup_group() showed up high overhead after memcpy(). Similarly somebody else reported that blkiocg_lookup_group() is eating 6% extra cpu. Though looking at the code I can't think why the overhead of this function is so high. One thing is that it is called with very high frequency (once for every IO). o For lot of folks blkio controller will be compiled in but they might not have actually created cgroups. Hence optimize the case of root cgroup where we can avoid calling blkiocg_lookup_group() if IO is happening in root group (common case). Reported-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-