- 09 8月, 2013 3 次提交
-
-
由 Tejun Heo 提交于
Currently, controllers have to explicitly follow the cgroup hierarchy to find the parent of a given css. cgroup is moving towards using cgroup_subsys_state as the main controller interface construct, so let's provide a way to climb the hierarchy using just csses. This patch implements css_parent() which, given a css, returns its parent. The function is guarnateed to valid non-NULL parent css as long as the target css is not at the top of the hierarchy. freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices are converted to use css_parent() instead of accessing cgroup->parent directly. * __parent_ca() is dropped from cpuacct and its usage is replaced with parent_ca(). The only difference between the two was NULL test on cgroup->parent which is now embedded in css_parent() making the distinction moot. Note that eventually a css->parent field will be added to css and the NULL check in css_parent() will go away. This patch shouldn't cause any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
css (cgroup_subsys_state) is usually embedded in a subsys specific data structure. Subsystems either use container_of() directly to cast from css to such data structure or has an accessor function wrapping such cast. As cgroup as whole is moving towards using css as the main interface handle, add and update such accessors to ease dealing with css's. All accessors explicitly handle NULL input and return NULL in those cases. While this looks like an extra branch in the code, as all controllers specific data structures have css as the first field, the casting doesn't involve any offsetting and the compiler can trivially optimize out the branch. * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such accessor. Added. * memory, hugetlb and devices already had one but didn't explicitly handle NULL input. Updated. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 10 7月, 2013 3 次提交
-
-
由 Philippe De Muyter 提交于
Graft AIX partitions enumeration into partitions/msdos.c There is already a AIX disks detection logic in msdos.c. When an AIX disk has been found, and if configured to, call the aix partitions recognizer. This avoids removal of AIX disks protection from msdos.c, avoids code duplication, and ensures that AIX partitions enumeration is called before plain msdos partitions enumeration. Signed-off-by: NPhilippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Philippe De Muyter 提交于
Add partitions/aix.h and partitions/aix.c. AIX LVM permits to make "logical volumes" which are made of multiple slices of multiple disks. The new code allows only access to the "logical volumes" which are made of one slice on the probed disk, a slice being a contiguous disk area. The code also detects "logical volumes" made of multiple slices on the probed disk, but can not describe them to the partition layer, because the partition layer generic code does not support that. When such non-contiguous "logical volumes" are detected, a diagnostic message is printed. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NPhilippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Philippe De Muyter 提交于
Signed-off-by: NPhilippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 7月, 2013 2 次提交
-
-
由 Kees Cook 提交于
Disk names may contain arbitrary strings, so they must not be interpreted as format strings. It seems that only md allows arbitrary strings to be used for disk names, but this could allow for a local memory corruption from uid 0 into ring 0. CVE-2013-2851 Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cong Wang 提交于
There is a hole in struct hd_geometry, so we have to zero the struct on stack before copying it to user-space. Signed-off-by: NCong Wang <amwang@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 7月, 2013 1 次提交
-
-
由 Jianpeng Ma 提交于
There's a race between elevator switching and normal io operation. Because the allocation of struct elevator_queue and struct elevator_data don't in a atomic operation.So there are have chance to use NULL ->elevator_data. For example: Thread A: Thread B blk_queu_bio elevator_switch spin_lock_irq(q->queue_block) elevator_alloc elv_merge elevator_init_fn Because call elevator_alloc, it can't hold queue_lock and the ->elevator_data is NULL.So at the same time, threadA call elv_merge and nedd some info of elevator_data.So the crash happened. Move the elevator_alloc into func elevator_init_fn, it make the operations in a atomic operation. Using the follow method can easy reproduce this bug 1:dd if=/dev/sdb of=/dev/null 2:while true;do echo noop > scheduler;echo deadline > scheduler;done The test method also use this method. Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 7月, 2013 2 次提交
-
-
由 Hannes Reinecke 提交于
rq_timed_out_fn might have been unset while the request was in flight, so we need to check for it in blk_rq_timed_out(). Acked-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NStefan Weinhuber <wein@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Hannes Reinecke 提交于
The DASD driver is using FASTFAIL as an equivalent to the transport errors in SCSI. And the 'steal lock' function maps roughly to a reservation error. So we should be returning the appropriate error codes when completing a request. Acked-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NStefan Weinhuber <wein@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 6月, 2013 1 次提交
-
-
由 Jan Kara 提交于
In case a device has three tags available we still reserve two of them for sync IO. That leaves only a single tag for async IO such as writeback from flusher thread which results in poor performance. Allow async IO to consume two tags in case queue has three tag availabe to get a decent async write performance. This patch improves streaming write performance on a machine with such disk from ~21 MB/s to ~52 MB/s. Also postmark throughput in presence of streaming writer improves from 8 to 12 transactions per second so sync IO doesn't seem to be harmed in presence of heavy async writer. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 5月, 2013 1 次提交
-
-
由 Aaron Lu 提交于
In blk_post_runtime_resume, an autosuspend request will be initiated for the device. Since we are holding the queue lock, we can't sleep and thus we should use the async version to initiate an autosuspend, i.e. pm_request_suspend instead of pm_runtime_suspend, which might sleep. Signed-off-by: NAaron Lu <aaron.lu@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 5月, 2013 27 次提交
-
-
由 Tejun Heo 提交于
With the recent updates, blk-throttle is finally ready for proper hierarchy support. Dispatching now honors service_queue->parent_sq and propagates correctly. The only thing missing is setting ->parent_sq correctly so that throtl_grp hierarchy matches the cgroup hierarchy. This patch updates throtl_pd_init() such that service_queues form the same hierarchy as the cgroup hierarchy if sane_behavior is enabled. As this concludes proper hierarchy support for blkcg, the shameful .broken_hierarchy tag is removed from blkio_subsys. v2: Updated blkio-controller.txt as suggested by Vivek. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
blk_throtl_bio() has a quick exit path for throtl_grps without limits configured. It looks at the bps and iops limits and if both are not configured, the bio is issued immediately. While this is correct in the current flat hierarchy as each throtl_grp behaves completely independently, it would become wrong in proper hierarchy mode. A group without any limits could still be limited by one of its ancestors and bio's queued for such group should not bypass blk-throtl. As having a quick bypass mechanism is beneficial, this patch reimplements the mechanism such that it's correct even with proper hierarchy. throtl_grp->has_rules[] is added. These booleans are updated for the whole subtree whenever a config is updated so that has_rules[] of the whole subtree stays synchronized. They're also updated when a new throtl_grp comes online so that it can't escape the limits of its ancestors. As no throtl_grp has another throtl_grp as parent now, this patch doesn't yet make any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Vivek Goyal 提交于
With the planned proper hierarchy support, a bio will climb up the tree before actually being dispatched. This makes sure bio is also subjected to parent's throttling limits, if any. It might happen that parent is idle and when bio is transferred to parent, a new slice starts fresh. But that is incorrect as parents wait time should have started when bio was queued in child group and causes IOs to be throttled more than configured as they climb the hierarchy. Given the fact that we have not written hierarchical algorithm in a way where child's and parents time slices are synchronized, we transfer the child's start time to parent if parent was idling. If parent was busy doing dispatch of other bios all this while, this is not an issue. Child's slice start time is passed to parent. Parent looks at its last expired slice start time. If child's start time is after parents old start time, that means parent had been idle and after parent went idle, child had an IO queued. So use child's start time as parent start time. If parent's start time is after child's start time, that means, when IO got queued in child group, parent was not idle. But later it dispatched some IO, its slice got trimmed and then it went idle. After a while child's request got shifted in parent group. In this case use parent's old start time as new start time as that's the duration of slice we did not use. This logic is far from perfect as if there are multiple childs then first child transferring the bio decides the start time while a bio might have queued up even earlier in other child, which is yet to be transferred up to parent. In that case we will lose time and bandwidth in parent. This patch is just an approximation to make situation somewhat better. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
With flat hierarchy, there's only single level of dispatching happening and fairness beyond that point is the responsibility of the rest of the block layer and driver, which usually works out okay; however, with the planned hierarchy support, service_queue->bio_lists[] can be filled up by bios from a single source. While the limits would still be honored, it'd be very easy to starve IOs from siblings or children. To avoid such starvation, this patch implements throtl_qnode and converts service_queue->bio_lists[] to lists of per-source qnodes which in turn contains the bio's. For example, when a bio is dispatched from a child group, the bio doesn't get queued on ->bio_lists[] directly but it first gets queued on the group's qnode which in turn gets queued on service_queue->queued[]. When dispatching for the upper level, the ->queued[] list is consumed in round-robing order so that the dispatch windows is consumed fairly by all IO sources. There are two ways a bio can come to a throtl_grp - directly queued to the group or dispatched from a child. For the former throtl_grp->qnode_on_self[rw] is used. For the latter, the child's ->qnode_on_parent[rw]. Note that this means that the child which is contributing a bio to its parent should stay pinned until all its bios are dispatched to its grand-parent. This patch moves blkg refcnting from bio add/remove spots to qnode activation/deactivation so that the blkg containing an active qnode is always pinned. As child pins the parent, this is sufficient for keeping the relevant sub-tree pinned while bios are in flight. The starvation issue was spotted by Vivek Goyal. v2: The original patch used the same throtl_grp->qnode_on_self/parent for reads and writes causing RWs to be queued incorrectly if there already are outstanding IOs in the other direction. They should be throtl_grp->qnode_on_self/parent[2] so that READs and WRITEs can use different qnodes. Spotted by Vivek Goyal. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_pending_timer_fn() currently assumes that the parent_sq is the top level one and the bio's dispatched are ready to be issued; however, this assumption will be wrong with proper hierarchy support. This patch makes the following changes to make throtl_pending_timer_fn() ready for hiearchy. * If the parent_sq isn't the top-level one, update the parent throtl_grp's dispatch time and schedule the next dispatch as necessary. If the parent's dispatch time is now, repeat the function for the parent throtl_grp. * If the parent_sq is the top-level one, kick issue work_item as before. * The debug message printed by throtl_log() now prints out the service_queue's nr_queued[] instead of the total nr_queued as the latter becomes uninteresting and misleading with hierarchical dispatch. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
tg_dispatch_one_bio() currently assumes that the parent_sq is the top level one and the bio being dispatched is ready to be issued; however, this assumption will be wrong with proper hierarchy support. This patch makes the following changes to make tg_dispatch_on_bio() ready for hiearchy. * throtl_data->nr_queued[] is incremented in blk_throtl_bio() instead of throtl_add_bio_tg() so that throtl_add_bio_tg() can be used to transfer a bio from a child tg to its parent. * tg_dispatch_one_bio() is updated to distinguish whether its parent is another throtl_grp or the throtl_data. If former, the bio is transferred to the parent throtl_grp using throtl_add_bio_tg(). If latter, the bio is ready to be issued and put on the top-level service_queue's bio_lists[] and throtl_data->nr_queued is decremented. As all throtl_grps currently have the top level service_queue as their ->parent_sq, this patch in itself doesn't make any behavior difference. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Currently, blk_throtl_bio() issues the passed in bio directly if it's within limits of its associated tg (throtl_grp). This behavior becomes incorrect with hierarchy support as the bio should be accounted to and throttled by the ancestor throtl_grps too. This patch makes the direct issue path of blk_throtl_bio() to loop until it reaches the top-level service_queue or gets throttled. If the former, the bio can be issued directly; otherwise, it gets queued at the first layer it was above limits. As tg->parent_sq is always the top-level service queue currently, this patch in itself doesn't make any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
The current blk_throtl_drain() assumes that all active throtl_grps are queued on throtl_data->service_queue, which won't be true once hierarchy support is implemented. This patch makes blk_throtl_drain() perform post-order walk of the blkg hierarchy draining each associated throtl_grp, which guarantees that all bios will eventually be pushed to the top-level service_queue in throtl_data. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Currently, blk_throtl_dispatch_work_fn() is responsible for both dispatching bio's from throtl_grp's according to their limits and then issuing the dispatched bios. This patch moves the dispatch part to throtl_pending_timer_fn() so that the work item is kicked iff there are bio's to issue. This is to avoid work item execution at each step when hierarchy support is enabled. bio's will be dispatched towards the top-level service_queue from the timers at each layer and the work item will only be used to issue the bio's which reached the top-level service_queue. While fetching bio's to issue from bio_lists[], blk_throtl_dispatch_work_fn() fetches all READs before WRITEs. While the original code also dispatched READs first, if multiple throtl_grps are dispatched on the same run, WRITEs from throtl_grp which is dispatched first would precede READs from throtl_grps which are dispatched later. While this is a behavior change, given that the previous code already prioritized READs and block layer generally prioritizes and segregates READs from WRITEs, this isn't likely to make any noticeable differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_select_dispatch() only dispatches throtl_quantum bios on each invocation. blk_throtl_dispatch_work_fn() in turn depends on throtl_schedule_next_dispatch() scheduling the next dispatch window immediately so that undue delays aren't incurred. This effectively chains multiple dispatch work item executions back-to-back when there are more than throtl_quantum bios to dispatch on a given tick. There is no reason to finish the current work item just to repeat it immediately. This patch makes throtl_schedule_next_dispatch() return %false without doing anything if the current dispatch window is still open and updates blk_throtl_dispatch_work_fn() repeat dispatching after cpu_relax() on %false return. This change will help implementing hierarchy support as dispatching will be done from pending_timer and immediate reschedule of timer function isn't supported and doesn't make much sense. While this patch changes how dispatch behaves when there are more than throtl_quantum bios to dispatch on a single tick, the behavior change is immaterial. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Currently, throtl_data->dispatch_work is a delayed_work item which handles both delayed dispatch and issuing bios. The two tasks will be separated to support proper hierarchy. To prepare for that, this patch separates out the timer into throtl_service_queue->pending_timer from throtl_data->dispatch_work and make the latter a work_struct. * As the timer is now per-service_queue, it's initialized and del_sync'd as its corresponding service_queue is created and destroyed. The timer, when triggered, simply schedules throtl_data->dispathc_work for execution. * throtl_schedule_delayed_work() is renamed to throtl_schedule_pending_timer() and takes @sq and @expires now. * Simiarly, throtl_schedule_next_dispatch() now takes @sq, which should be the parent_sq of the service_queue which just got a new bio or updated. As the parent_sq is always the top-level service_queue now, this doesn't change anything at this point. This patch doesn't introduce any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
With proper hierarchy support, a bio can be dispatched multiple times until it reaches the top-level service_queue and we don't want to update dispatch stats at each step. They are local stats and will be kept local. If recursive stats are necessary, they should be implemented separately and definitely not by updating counters recursively on each dispatch. This patch moves REQ_THROTTLED setting to throtl_charge_bio() and gate stats update with it so that dispatch stats are updated only on the first time the bio is charged to a throtl_grp, which will always be the throtl_grp the bio was originally queued to. This means that REQ_THROTTLED would be set even for bios which don't get throttled. As we don't want bios to leave blk-throtl with the flag set, move REQ_THROTLLED clearing to the end of blk_throtl_bio() and clear if the bio is being issued directly. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Now that both throtl_data and throtl_grp embed throtl_service_queue, we can unify throtl_log() and throtl_log_tg(). * sq_to_tg() is added. This returns the throtl_grp a service_queue is embedded in. If the service_queue is the top-level one embedded in throtl_data, NULL is returned. * sq_to_td() is added. A service_queue is always associated with a throtl_data. This function finds the associated td and returns it. * throtl_log() is updated to take throtl_service_queue instead of throtl_data. If the service_queue is one embedded in throtl_grp, it prints the same header as throtl_log_tg() did. If it's one embedded in throtl_data, it behaves the same as before. This renders throtl_log_tg() unnecessary. Removed. This change is necessary for hierarchy support as we're gonna be using the same code paths to dispatch bios to intermediate service_queues embedded in throtl_grps and the top-level service_queue embedded in throtl_data. This patch doesn't make any behavior changes. v2: throtl_log() didn't print a space after blkg path. Updated so that it prints a space after throtl_grp path. Spotted by Vivek. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
To prepare for hierarchy support, this patch adds throtl_service_queue->service_sq which points to the arent service_queue. Currently, for all service_queues embedded in throtl_grps, it points to throtl_data->service_queue. As throtl_data->service_queue doesn't have a parent its parent_sq is set to NULL. There are a number of functions which take both throtl_grp *tg and throtl_service_queue *parent_sq. With this patch, the parent service_queue can be determined from @tg and the @parent_sq arguments are removed. This patch doesn't make any behavior differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
When blk_throtl_bio() wants to queue a bio to a tg (throtl_grp), it avoids invoking tg_update_disptime() and throtl_schedule_next_dispatch() if the tg already has bios queued in that direction. As a new bio is appeneded after the existing ones, it can't change the tg's next dispatch time or the parent's dispatch schedule. This optimization is currently open coded in blk_throtl_bio(). Whether the target biolist was occupied was recorded in a local variable and later used to skip disptime update. This patch moves generalizes it so that throtl_add_bio_tg() sets a new flag THROTL_TG_WAS_EMPTY if the biolist was empty before the new bio was added. tg_update_disptime() clears the flag automatically. blk_throtl_bio() is updated to simply test the flag before updating disptime. This patch doesn't make any functional differences now but will enable using the same optimization for recursive dispatch. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_service_queues will eventually form a tree which is anchored at throtl_data->service_queue and queue bios will climb the tree to the top service_queue to be executed. This patch makes the dispatch paths in blk_throtl_dispatch_work_fn() and blk_throtl_drain() to dispatch bios to throtl_data->service_queue.bio_lists[] instead of the on-stack bio_lists. This will keep the final dispatch to the top level service_queue share the same mechanism as dispatches through the rest of the hierarchy. As bio's should be issued in a sleepable context, blk_throtl_dispatch_work_fn() transfers all dispatched bio's from the service_queue bio_lists[] into an onstack one before dropping queue_lock and issuing the bio's. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_service_queues will eventually form a tree which is anchored at throtl_data->service_queue and queue bios will climb the tree to the top service_queue to be executed. This patch moves bio_lists[] and nr_queued[] from throtl_grp to its service_queue to prepare for that. As currently only the throtl_data->service_queue is in use, this patch just ends up moving throtl_grp->bio_lists[] and ->nr_queued[] to throtl_grp->service_queue.bio_lists[] and ->nr_queued[] without making any functional differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Currently, there's single service_queue per queue - throtl_data->service_queue. All active throtl_grp's are queued on the queue and dispatched according to their limits. To support hierarchy, this will be expanded such that active throtl_grp's form a tree anchored at throtl_data->service_queue and chained through each intermediate throtl_grp's service_queue. This patch adds throtl_grp->service_queue to prepare for hierarchy support. The initialization function - throtl_service_queue_init() - is added and replaces the macro initializer. The newly added tg->service_queue isn't used yet. Following patches will do. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_service_queue will be the building block of hierarchy support and will form a tree. This patch updates its usages as arguments to reduce confusion. * When a service queue is used as the parent role - the host of the rbtree - use @parent_sq instead of @sq. * For functions taking both @tg and @parent_sq, reorder them so that the order is (@tg, @parent_sq) not the other way around. This makes the code follow the usual convention of specifying the primary target of the operation as the first argument. This patch doesn't make any functional differences. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_service_queue will be used as the basic block to implement hierarchy support. Pass around throtl_service_queue *sq instead of throtl_data *td in the following functions which will be used across multiple levels of hierarchy. * [__]throtl_enqueue/dequeue_tg() * throtl_add_bio_tg() * tg_update_disptime() * throtl_select_dispatch() Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Add throtl_grp->td so that the td (throtl_data) a given tg (throtl_grp) belongs to can be determined, and remove @td argument from functions which take both @td and @tg as the former now can be determined from the latter. This generally simplifies the code and removes a number of cases where @td is passed as an argument without being actually used. This will also help hierarchy support implementation. While at it, in multi-line conditions, move the logical operators leading broken lines to the end of the previous line. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
blk-throttle is still using function-defining macros to define flag handling functions, which went out style at least a decade ago. Just define the flag as bitmask and use direct bit operations. This patch doesn't make any functional changes. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_rb_root will be expanded to cover more roles for hierarchy support. Rename it to throtl_service_queue and make its fields more descriptive. * rb -> pending_tree * left -> first_pending * count -> nr_pending * min_disptime -> first_pending_disptime This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
throtl_nr_queued() is used in several places to avoid performing certain operations when the throtl_data is empty. This usually is useless as those paths usually aren't traveled if there's no bio queued. * throtl_schedule_delayed_work() skips scheduling dispatch work item if @td doesn't have any bios queued; however, the only case it can be called when @td is empty is from tg_set_conf() which isn't something we should be optimizing for. * throtl_schedule_next_dispatch() takes a quick exit if @td is empty; however, right after that it triggers BUG if the service tree is empty. The two conditions are equivalent and it can just test @st->count for the quick exit. * blk_throtl_dispatch_work_fn() skips dispatch if @td is empty. This work function isn't usually invoked when @td is empty. The only possibility is from tg_set_conf() and when it happens the normal dispatching path can handle empty @td fine. No need to add special skip path. This patch removes the above three unnecessary optimizations, which leave throtl_log() call in blk_throtl_dispatch_work_fn() the only user of throtl_nr_queued(). Remove throtl_nr_queued() and open code it in throtl_log(). I don't think we need td->nr_queued[] at all. Maybe we can remove it later. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
Move throtl_schedule_delayed_work() above its first user so that the forward declaration can be removed. This patch is pure relocaiton. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
blk-throttle is about to go through major restructuring to support hierarchy. Do cosmetic updates in preparation. * s/throtl_data->throtl_work/throtl_data->dispatch_work/ * s/blk_throtl_work()/blk_throtl_dispatch_work_fn()/ * Collapse throtl_dispatch() into blk_throtl_dispatch_work_fn() This patch is purely cosmetic. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-
由 Tejun Heo 提交于
When bps or iops configuration changes, blk-throttle records the new configuration and sets a flag indicating that the config has changed. The flag is checked in the bio dispatch path and applied. This deferred config application was necessary due to limitations in blkcg framework, which haven't existed for quite a while now. This patch removes the deferred config application mechanism and applies new configurations directly from tg_set_conf(), which is simpler. v2: Dropped unnecessary throtl_schedule_delayed_work() call from tg_set_conf() as suggested by Vivek Goyal. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NVivek Goyal <vgoyal@redhat.com>
-