- 29 1月, 2010 1 次提交
-
-
由 Alan D. Brunelle 提交于
Updated 'nomerges' tunable to accept a value of '2' - indicating that _no_ merges at all are to be attempted (not even the simple one-hit cache). The following table illustrates the additional benefit - 5 minute runs of a random I/O load were applied to a dozen devices on a 16-way x86_64 system. nomerges Throughput %System Improvement (tput / %sys) -------- ------------ ----------- ------------------------- 0 12.45 MB/sec 0.669365609 1 12.50 MB/sec 0.641519199 0.40% / 2.71% 2 12.52 MB/sec 0.639849750 0.56% / 2.96% Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 1月, 2010 5 次提交
-
-
由 Martin K. Petersen 提交于
All callers of the stacking functions use 512-byte sector units rather than byte offsets. Simplify the code so the stacking functions take sectors when specifying data offsets. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Kirill Afonshin 提交于
It isn't used anymore, since AS was deleted. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Martin K. Petersen 提交于
DM does not want to know about partition offsets. Add a partition-aware wrapper that DM can use when stacking block devices. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NAlasdair G Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Martin K. Petersen 提交于
Discard alignment reporting for partitions was incorrect. Update to match the algorithm used elsewhere. The alignment can be negative (misaligned). Fix format string accordingly. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Martin K. Petersen 提交于
The top device misalignment flag would not be set if the added bottom device was already misaligned as opposed to causing a stacking failure. Also massage the reporting so that an error is only returned if adding the bottom device caused the misalignment. I.e. don't return an error if the top is already flagged as misaligned. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 29 12月, 2009 2 次提交
-
-
由 OGAWA Hirofumi 提交于
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Martin K. Petersen 提交于
queue_sector_alignment_offset returned the wrong value which caused partitions to report an incorrect alignment_offset. Since offset alignment calculation is needed several places it has been split into a separate helper function. The topology stacking function has been updated accordingly. Furthermore, comments have been added to clarify how the stacking function works. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Tested-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 28 12月, 2009 1 次提交
-
-
由 Shaohua Li 提交于
seek_mean could be very big sometimes, using it as close criteria is meaningless as this doen't improve any performance. So if it's big, let's fallback to default value. Reviewed-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 21 12月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
The stacking code incorrectly scaled up the data offset in some cases causing misaligned devices to report alignment. Rewrite the stacking algorithm to remedy this and apply the same alignment principles to the discard handling. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 18 12月, 2009 3 次提交
-
-
由 Vivek Goyal 提交于
o CFQ now internally divides cfq queues in therr workload categories. sync-idle, sync-noidle and async. Which workload to run depends primarily on rb_key offset across three service trees. Which is a combination of mulitiple things including what time queue got queued on the service tree. There is one exception though. That is if we switched the prio class, say we served some RT tasks and again started serving BE class, then with-in BE class we always started with sync-noidle workload irrespective of rb_key offset in service trees. This can provide better latencies for sync-noidle workload in the presence of RT tasks. o This patch gets rid of that exception and which workload to run with-in class always depends on lowest rb_key across service trees. The reason being that now we have multiple BE class groups and if we always switch to sync-noidle workload with-in group, we can potentially starve a sync-idle workload with-in group. Same is true for async workload which will be in root group. Also the workload-switching with-in group will become very unpredictable as it now depends whether some RT workload was running in the system or not. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o Currently code does not seem to be using cfqd->nr_groups. Get rid of it. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o allow_merge() already checks if submitting task is pointing to same cfqq as rq has been queued in. If everything is fine, we should not be having a task in one cgroup and having a pointer to cfqq in other cgroup. Well I guess in some situations it can happen and that is, when a random IO queue has been moved into root cgroup for group_isolation=0. In this case, tasks's cgroup/group is different from where actually cfqq is, but this is intentional and in this case merging should be allowed. The second situation is where due to close cooperator patches, multiple processes can be sharing a cfqq. If everything implemented right, we should not end up in a situation where tasks from different processes in different groups are sharing the same cfqq as we allow merging of cooperating queues only if they are in same group. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 16 12月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
Commit 86b37281 adds a check for misaligned stacking offsets, but it's buggy since the defaults are 0. Hence all dm devices that pass in a non-zero starting offset will be marked as misaligned amd dm will complain. A real fix is coming, in the mean time disable the discard granularity check so that users don't worry about dm reporting about misaligned devices. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 15 12月, 2009 1 次提交
-
-
由 Gui Jianfeng 提交于
When a group is resumed, if it doesn't have workload slice left, we should set workload_expires as expired. Otherwise, we might start from where we left in previous group by error. Thanks the idea from Corrado. Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 12月, 2009 1 次提交
-
-
由 Vivek Goyal 提交于
I think my previous patch introduced a bug which can lead to CFQ hitting BUG_ON(). The offending commit in for-2.6.33 branch is. commit 7667aa06 Author: Vivek Goyal <vgoyal@redhat.com> Date: Tue Dec 8 17:52:58 2009 -0500 cfq-iosched: Take care of corner cases of group losing share due to deletion While doing some stress testing on my box, I enountered following. login: [ 3165.148841] BUG: scheduling while atomic: swapper/0/0x10000100 [ 3165.149821] Modules linked in: cfq_iosched dm_multipath qla2xxx igb scsi_transport_fc dm_snapshot [last unloaded: scsi_wait_scan] [ 3165.149821] Pid: 0, comm: swapper Not tainted 2.6.32-block-for-33-merged-new #3 [ 3165.149821] Call Trace: [ 3165.149821] <IRQ> [<ffffffff8103fab8>] __schedule_bug+0x5c/0x60 [ 3165.149821] [<ffffffff8103afd7>] ? __wake_up+0x44/0x4d [ 3165.149821] [<ffffffff8153a979>] schedule+0xe3/0x7bc [ 3165.149821] [<ffffffff8103a796>] ? cpumask_next+0x1d/0x1f [ 3165.149821] [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e [cfq_iosched] [ 3165.149821] [<ffffffff810422d8>] __cond_resched+0x2a/0x35 [ 3165.149821] [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e [cfq_iosched] [ 3165.149821] [<ffffffff8153b1ee>] _cond_resched+0x2c/0x37 [ 3165.149821] [<ffffffff8100e2db>] is_valid_bugaddr+0x16/0x2f [ 3165.149821] [<ffffffff811e4161>] report_bug+0x18/0xac [ 3165.149821] [<ffffffff8100f1fc>] die+0x39/0x63 [ 3165.149821] [<ffffffff8153cde1>] do_trap+0x11a/0x129 [ 3165.149821] [<ffffffff8100d470>] do_invalid_op+0x96/0x9f [ 3165.149821] [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e [cfq_iosched] [ 3165.149821] [<ffffffff81034b4d>] ? enqueue_task+0x5c/0x67 [ 3165.149821] [<ffffffff8103ae83>] ? task_rq_unlock+0x11/0x13 [ 3165.149821] [<ffffffff81041aae>] ? try_to_wake_up+0x292/0x2a4 [ 3165.149821] [<ffffffff8100c935>] invalid_op+0x15/0x20 [ 3165.149821] [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e [cfq_iosched] [ 3165.149821] [<ffffffff810df5a6>] ? virt_to_head_page+0xe/0x2f [ 3165.149821] [<ffffffff811d8c2a>] blk_peek_request+0x191/0x1a7 [ 3165.149821] [<ffffffff811e5b8d>] ? kobject_get+0x1a/0x21 [ 3165.149821] [<ffffffff812c8d4c>] scsi_request_fn+0x82/0x3df [ 3165.149821] [<ffffffff8110b2de>] ? bio_fs_destructor+0x15/0x17 [ 3165.149821] [<ffffffff810df5a6>] ? virt_to_head_page+0xe/0x2f [ 3165.149821] [<ffffffff811d931f>] __blk_run_queue+0x42/0x71 [ 3165.149821] [<ffffffff811d9403>] blk_run_queue+0x26/0x3a [ 3165.149821] [<ffffffff812c8761>] scsi_run_queue+0x2de/0x375 [ 3165.149821] [<ffffffff812b60ac>] ? put_device+0x17/0x19 [ 3165.149821] [<ffffffff812c92d7>] scsi_next_command+0x3b/0x4b [ 3165.149821] [<ffffffff812c9b9f>] scsi_io_completion+0x1c9/0x3f5 [ 3165.149821] [<ffffffff812c3c36>] scsi_finish_command+0xb5/0xbe I think I have hit following BUG_ON() in cfq_dispatch_request(). BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)); Please find attached the patch to fix it. I have done some stress testing with it and have not seen it happening again. o We should wait on a queue even after slice expiry only if it is empty. If queue is not empty then continue to expire it. o If we decide to keep the queue then make cfqq=NULL. Otherwise select_queue() will return a valid cfqq and cfq_dispatch_request() can hit following BUG_ON(). BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)) Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 10 12月, 2009 2 次提交
-
-
由 Gui Jianfeng 提交于
Remove wait_request flag when idle time is being deleted, otherwise it'll hit this path every time when a request is enqueued. Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Corrado Zoccolo 提交于
Added a comment to explain the initialization of last_delayed_sync. Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 09 12月, 2009 4 次提交
-
-
由 Vivek Goyal 提交于
If there is a sequential reader running in a group, we wait for next request to come in that group after slice expiry and once new request is in, we expire the queue. Otherwise we delete the group from service tree and group looses its fair share. So far I was marking a queue as wait_busy if it had consumed its slice and it was last queue in the group. But this condition did not cover following two cases. 1.If a request completed and slice has not expired yet. Next request comes in and is dispatched to disk. Now select_queue() hits and slice has expired. This group will be deleted. Because request is still in the disk, this queue will never get a chance to wait_busy. 2.If request completed and slice has not expired yet. Before next request comes in (delay due to think time), select_queue() hits and expires the queue hence group. This queue never got a chance to wait busy. Gui was hitting the boundary condition 1 and not getting fairness numbers proportional to weight. This patch puts the checks for above two conditions and improves the fairness numbers for sequential workload on rotational media. Check in select_queue() takes care of case 1 and additional check in should_wait_busy() takes care of case 2. Reported-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o Get rid of wait_busy_done flag. This flag only tells we were doing wait busy on a queue and that queue got request so expire it. That information can easily be obtained by (cfq_cfqq_wait_busy() && queue_is_not_empty). So remove this flag and keep code simple. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Gui Jianfeng 提交于
It doesn't make any sense to try to find out a close cooperating queue if current cfqq is the only one in the group. Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Corrado Zoccolo 提交于
The introduction of ramp-up formula for async queue depths has slowed down dirty page reclaim, by reducing async write performance. This patch makes sure the formula kicks in only when sync request was recently delayed. Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 08 12月, 2009 1 次提交
-
-
由 Vivek Goyal 提交于
Fix a crash during boot reported by Jeff Moyer. Fix the issue of accessing cfqq after freeing it. Reported-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@carl.(none)>
-
- 07 12月, 2009 1 次提交
-
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 06 12月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
After the merge of the IO controller patches, booting on my megaraid box ran much slower. Vivek Goyal traced it down to megaraid discovery creating tons of devices, each suffering a grace period when they later kill that queue (if no device is found). So lets use call_rcu() to batch these deferred frees, instead of taking the grace period hit for each one. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 04 12月, 2009 15 次提交
-
-
由 Vivek Goyal 提交于
o Now issues of blkio controller and CFQ in module mode should be fixed. Enable the cfq group scheduling support in module mode. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o One of the goals of block IO controller is that it should be able to support mulitple io control policies, some of which be operational at higher level in storage hierarchy. o To begin with, we had one io controlling policy implemented by CFQ, and I hard coded the CFQ functions called by blkio. This created issues when CFQ is compiled as module. o This patch implements a basic dynamic io controlling policy registration functionality in blkio. This is similar to elevator functionality where ioschedulers register the functions dynamically. o Now in future, when more IO controlling policies are implemented, these can dynakically register with block IO controller. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o blkio controller is inside the kernel and cfq makes use of interfaces exported by blkio. CFQ can be a module too, hence export symbols used by CFQ. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Louis Rilling 提交于
With CLONE_IO, parent's io_context->nr_tasks is incremented, but never decremented whenever copy_process() fails afterwards, which prevents exit_io_context() from calling IO schedulers exit functions. Give a task_struct to exit_io_context(), and call exit_io_context() instead of put_io_context() in copy_process() cleanup path. Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Louis Rilling 提交于
With CLONE_IO, copy_io() increments both ioc->refcount and ioc->nr_tasks. However exit_io_context() only decrements ioc->refcount if ioc->nr_tasks reaches 0. Always call put_io_context() in exit_io_context(). Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 André Goddard Rosa 提交于
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Shaohua Li 提交于
cfq_arm_slice_timer() has logic to disable idle window for SSD device. The same thing should be done at cfq_select_queue() too, otherwise we will still see idle window. This makes the nonrot check logic consistent in cfq. Tests in a intel SSD with low_latency knob close, below patch can triple disk thoughput for muti-thread sequential read. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
It's currently not an allowed configuration, so express that in Kconfig. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
They should not be declared inside some other file that's not related to CFQ. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o rq_noidle() is supposed to tell cfq that do not expect a request after this one, hence don't idle. But this does not seem to work very well. For example for direct random readers, rq_noidle = 1 but there is next request coming after this. Not idling, leads to a group not getting its share even if group_isolation=1. o The right solution for this issue is to scan the higher layers and set right flag (WRITE_SYNC or WRITE_ODIRECT). For the time being, this single line fix helps. This should not have any significant impact when we are not using cgroups. I will later figure out IO paths in higher layer and fix it. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o If a group is running only a random reader, then it will not have enough traffic to keep disk busy and we will reduce overall throughput. This should result in better latencies for random reader though. If we don't idle on random reader service tree, then this random reader will experience large latencies if there are other groups present in system with sequential readers running in these. o One solution suggested by corrado is that by default keep the random readers or sync-noidle workload in root group so that during one dispatch round we idle only once on sync-noidle tree. This means that all the sync-idle workload queues will be in their respective group and we will see service differentiation in those but not on sync-noidle workload. o Provide a tunable group_isolation. If set, this will make sure that even sync-noidle queues go in their respective group and we wait on these. This provides stronger isolation between groups but at the expense of throughput if group does not have enough traffic to keep the disk busy. o By default group_isolation = 0 Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o Async queues are not per group. Instead these are system wide and maintained in root group. Hence their workload slice length should be calculated based on total number of queues in the system and not just queues in the root group. o As root group's default weight is 1000, make sure to charge async queue more in terms of vtime so that it does not get more time on disk because root group has higher weight. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o If a queue consumes its slice and then gets deleted from service tree, its associated group will also get deleted from service tree if this was the only queue in the group. That will make group loose its share. o For the queues on which we have idling on and if these have used their slice, wait a bit for these queues to get backlogged again and then expire these queues so that group does not loose its share. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o Propagate blkio cgroup weight updation to associated cfq groups. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-