- 02 8月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
FQ keeps track of number of groups which are linked on blkcg->blkg_list. This is useful to avoid races between queue exit and cgroup exit code paths. So if at the request queue exit time linked group count is not zero, that means there are some group out there which is yet to be deleted under rcu read period and queue exit code should wait for on rcu period. In my previous patch I forgot to decrease the number of group count. So in current form, we nr_blkcg_linked_grps is always non-zero and we will always wait one rcu period (if BLK_CGROUP=y). The side effect of this is that it can increase boot time. I am surprised, nobody complained so far. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 12 7月, 2011 4 次提交
-
-
由 Shaohua Li 提交于
Currently when the last queue of a group has no request, we don't expire the queue to hope request from the group comes soon, so the group doesn't miss its share. But if the think time is big, the assumption isn't correct and we just waste bandwidth. In such case, we don't do idle. [global] runtime=30 direct=1 [test1] cgroup=test1 cgroup_weight=1000 rw=randread ioengine=libaio size=500m runtime=30 directory=/mnt filename=file1 thinktime=9000 [test2] cgroup=test2 cgroup_weight=1000 rw=randread ioengine=libaio size=500m runtime=30 directory=/mnt filename=file2 patched base test1 64k 39k test2 548k 540k total 604k 578k group1 gets much better throughput because it waits less time. To check if the patch changes behavior of queue without think time. I also tried to give test1 2ms think time or no think time. The test result is stable. The thoughput doesn't change with/without the patch. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Shaohua Li 提交于
Currently when the last queue of a service tree has no request, we don't expire the queue to hope request from the service tree comes soon, so the service tree doesn't miss its share. But if the think time is big, the assumption isn't correct and we just waste bandwidth. In such case, we don't do idle. [global] runtime=10 direct=1 [test1] rw=randread ioengine=libaio size=500m directory=/mnt filename=file1 thinktime=9000 [test2] rw=read ioengine=libaio size=1G directory=/mnt filename=file2 patched base test1 41k/s 33k/s test2 15868k/s 15789k/s total 15902k/s 15817k/s A slightly better To check if the patch changes behavior of queue without think time. I also tried to give test1 2ms think time or no think time. The test has variation even without the patch, but the average throughput doesn't change with/without the patch. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Shaohua Li 提交于
Move the variables to do think time check to a sepatate struct. This is to prepare adding think time check for service tree and group. No functional change. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
fs_excl is a poor man's priority inheritance for filesystems to hint to the block layer that an operation is important. It was never clearly specified, not widely adopted, and will not prevent starvation in many cases (like across cgroups). fs_excl was introduced with the time sliced CFQ IO scheduler, to indicate when a process held FS exclusive resources and thus needed a boost. It doesn't cover all file systems, and it was never fully complete. Lets kill it. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 11 7月, 2011 1 次提交
-
-
由 Justin TerAvest 提交于
There is no consistency among filesystems from what bios (or requests) are marked as being metadata. It's interesting to expose this in traces, but we shouldn't schedule the requests differently based on whether or not they're marked as being metadata. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 27 6月, 2011 2 次提交
-
-
由 Shaohua Li 提交于
ioc->ioc_data is rcu protectd, so uses correct API to access it. This doesn't change any behavior, but just make code consistent. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: stable@kernel.org # after ab4bd22dSigned-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Shaohua Li 提交于
I got a rcu warnning at boot. the ioc->ioc_data is rcu_deferenced, but doesn't hold rcu_read_lock. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: stable@kernel.org # after ab4bd22dSigned-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 14 6月, 2011 1 次提交
-
-
由 Joe Perches 提交于
Use the compiler to verify format strings and arguments. Fix fallout. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 13 6月, 2011 1 次提交
-
-
由 Joe Perches 提交于
Use the compiler to verify format strings and arguments. Fix fallout. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 06 6月, 2011 3 次提交
-
-
由 Paul Bolle 提交于
Correctly suggested by sparse. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Since we are modifying this RCU pointer, we need to hold the lock protecting it around it. This fixes a potential reuse and double free of a cfq io_context structure. The bug has been in CFQ for a long time, it hit very few people but those it did hit seemed to see it a lot. Tracked in RH bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=577968 Credit goes to Paul Bolle for figuring out that the issue was around the one-hit ioc->ioc_data cache. Thanks to his hard work the issue is now fixed. Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Since we are modifying this RCU pointer, we need to hold the lock protecting it around it. This fixes a potential reuse and double free of a cfq io_context structure. The bug has been in CFQ for a long time, it hit very few people but those it did hit seemed to see it a lot. Tracked in RH bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=577968 Credit goes to Paul Bolle for figuring out that the issue was around the one-hit ioc->ioc_data cache. Thanks to his hard work the issue is now fixed. Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 03 6月, 2011 1 次提交
-
-
由 Jeff Moyer 提交于
Hi, Jens, If you recall, I posted an RFC patch for this back in July of last year: http://lkml.org/lkml/2010/7/13/279 The basic problem is that a process can issue a never-ending stream of async direct I/Os to the same sector on a device, thus starving out other I/O in the system (due to the way the alias handling works in both cfq and deadline). The solution I proposed back then was to start dispatching from the fifo after a certain number of aliases had been dispatched. Vivek asked why we had to treat aliases differently at all, and I never had a good answer. So, I put together a simple patch which allows aliases to be added to the rb tree (it adds them to the right, though that doesn't matter as the order isn't guaranteed anyway). I think this is the preferred solution, as it doesn't break up time slices in CFQ or batches in deadline. I've tested it, and it does solve the starvation issue. Let me know what you think. Cheers, Jeff Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 02 6月, 2011 1 次提交
-
-
由 Paul Bolle 提交于
queue_fail can only be reached if cic is NULL, so its check for cic must be bogus. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 01 6月, 2011 1 次提交
-
-
由 Kyungmin Park 提交于
Fix comment typo and remove unnecessary semicolon at macro Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 24 5月, 2011 4 次提交
-
-
由 Namhyung Kim 提交于
When struct cfq_data allocation fails, cic_index need to be freed. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Namhyung Kim 提交于
The 'group_changed' variable is initialized to 0 and never changed, so checking the variable is meaningless. It is a leftover from 0bbfeb83 ("cfq-iosched: Always provide group iosolation."). Let's get rid of it. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Cc: Justin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Namhyung Kim 提交于
Reduce the number of bit operations in cfq_choose_req() on average (and worst) cases. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Namhyung Kim 提交于
Simplify the calculation in cfq_prio_to_maxrq(), plus replace CFQ_PRIO_LISTS to IOPRIO_BE_NR since they are the same and IOPRIO_BE_NR looks more reasonable in this context IMHO. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 23 5月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
We allocated per cpu stats struct for root group but did not free it. Fix it. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 21 5月, 2011 4 次提交
-
-
由 Vivek Goyal 提交于
Currently we take blkg_stat lock for even updating the stats. So even if a group has no throttling rules (common case for root group), we end up taking blkg_lock, for updating the stats. Make dispatch stats per cpu so that these can be updated without taking blkg lock. If cpu goes offline, these stats simply disappear. No protection has been provided for that yet. Do we really need anything for that? Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Currently, all the cfq_group or throtl_group allocations happen while we are holding ->queue_lock and sleeping is not allowed. Soon, we will move to per cpu stats and also need to allocate the per group stats. As one can not call alloc_percpu() from atomic context as it can sleep, we need to drop ->queue_lock, allocate the group, retake the lock and continue processing. In throttling code, I check the queue DEAD flag again to make sure that driver did not call blk_cleanup_queue() in the mean time. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
blkg->key = cfqd is an rcu protected pointer and hence we used to do call_rcu(cfqd->rcu_head) to free up cfqd after one rcu grace period. The problem here is that even though cfqd is around, there are no gurantees that associated request queue (td->queue) or q->queue_lock is still around. A driver might have called blk_cleanup_queue() and release the lock. It might happen that after freeing up the lock we call blkg->key->queue->queue_ock and crash. This is possible in following path. blkiocg_destroy() blkio_unlink_group_fn() cfq_unlink_blkio_group() Hence, wait for an rcu peirod if there are groups which have not been unlinked from blkcg->blkg_list. That way, if there are any groups which are taking cfq_unlink_blkio_group() path, can safely take queue lock. This is how we have taken care of race in throttling logic also. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
Nobody seems to be using cfq_find_alloc_cfqg() function parameter "create". Get rid of that. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 16 5月, 2011 1 次提交
-
-
由 Vivek Goyal 提交于
Currentlly we first map the task to cgroup and then cgroup to blkio_cgroup. There is a more direct way to get to blkio_cgroup from task using task_subsys_state(). Use that. The real reason for the fix is that it also avoids a race in generic cgroup code. During remount/umount rebind_subsystems() is called and it can do following with and rcu protection. cgrp->subsys[i] = NULL; That means if somebody got hold of cgroup under rcu and then it tried to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which is wrong. I was running into this race condition with ltp running on a upstream derived kernel and that lead to crash. So ideally we should also fix cgroup generic code to wait for rcu grace period before setting pointer to NULL. Li Zefan is not very keen on introducing synchronize_wait() as he thinks it will slow down moun/remount/umount operations. So for the time being atleast fix the kernel crash by taking a more direct route to blkio_cgroup. One tester had reported a crash while running LTP on a derived kernel and with this fix crash is no more seen while the test has been running for over 6 days. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 19 4月, 2011 1 次提交
-
-
由 Jens Axboe 提交于
For some configurations of CONFIG_PREEMPT that is not true. So get rid of __call_for_each_cic() and always uses the explicitly rcu_read_lock() protected call_for_each_cic() instead. This fixes a potential bug related to IO scheduler removal or online switching. Thanks to Paul McKenney for clarifying this. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 18 4月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 23 3月, 2011 3 次提交
-
-
由 Li, Shaohua 提交于
Removing think time checking. A high thinktime queue might means the queue dispatches several requests and then do away. Limitting such queue seems meaningless. And also this can simplify code. This is suggested by Vivek. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
For v2, I added back lines to cfq_preempt_queue() that were removed during updates for accounting unaccounted_time. Thanks for pointing out that I'd missed these, Vivek. Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly cleared stats for preempting queues when it shouldn't have, because when we choose a queue to preempt, it still isn't necessarily scheduled next. Thanks to Vivek Goyal for figuring this out and understanding how the preemption code works. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
Commit "Add unaccounted time to timeslice_used" changed the behavior of cfq_preempt_queue to set cfqq active. Vivek pointed out that other preemption rules might get involved, so we shouldn't manually set which queue is active. This cleans up the code to just clear the queue stats at preemption time. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 17 3月, 2011 1 次提交
-
-
由 Justin TerAvest 提交于
Version 3 is updated to apply to for-2.6.39/core. For version 2, I took Vivek's advice and made sure we update the group weight from cfq_group_service_tree_add(). If a weight was updated while a group is on the service tree, the calculation for the total weight of the service tree can be adjusted improperly, which either leads to bad service tree weights, or potentially crashes (if total_weight becomes 0). This patch defers updates to the weight until a group is off the service tree. Signed-off-by: NJustin TerAvest <teravest@google.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 12 3月, 2011 1 次提交
-
-
由 Justin TerAvest 提交于
There are two kind of times that tasks are not charged for: the first seek and the extra time slice used over the allocated timeslice. Both of these exported as a new unaccounted_time stat. I think it would be good to have this reported in 'time' as well, but that is probably a separate discussion. Signed-off-by: NJustin TerAvest <teravest@google.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 3月, 2011 1 次提交
-
-
由 Jens Axboe 提交于
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 07 3月, 2011 3 次提交
-
-
由 Gui Jianfeng 提交于
The update_vdisktime logic is broken since commit b54ce60e, st->min_vdisktime never makes a progress. Fix it. Thanks Vivek for pointing it out. Signed-off-by: NGui Jianfeng <guijianfen@cn.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Shaohua Li 提交于
If there are a sync and an async queue and the sync queue's think time is small, we can ignore the sync queue's dispatch quantum. Because the sync queue will always preempt the async queue, we don't need to care about async's latency. This can fix a performance regression of aiostress test, which is introduced by commit f8ae6e3e. The issue should exist even without the commit, but the commit amplifies the impact. The initial post does the same optimization for RT queue too, but since I have no real workload for it, Vivek suggests to drop it. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
We need to hold the queue lock over the reference increment, it's not atomic anymore. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 02 3月, 2011 2 次提交
-
-
由 Tejun Heo 提交于
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Justin TerAvest 提交于
Effectively, make group_isolation=1 the default and remove the tunable. The setting group_isolation=0 was because by default we idle on sync-noidle tree and on fast devices, this can be very harmful for throughput. However, this problem can also be addressed by tuning slice_idle and possibly group_idle on faster storage devices. This change simplifies the CFQ code by removing the feature entirely. Signed-off-by: NJustin TerAvest <teravest@google.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-