- 08 10月, 2009 2 次提交
-
-
由 Jens Axboe 提交于
Saves 16 bytes of text, woohoo. But the more important point is that it makes the code more readable when returning bool for 0/1 cases. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Corrado Zoccolo 提交于
CFQ enables idle only for processes that think less than the allowed idle time. Since idle time is lower for seeky queues, we should use the correct value in the comparison. Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 07 10月, 2009 2 次提交
-
-
由 Jens Axboe 提交于
We should subtract the slice residual from the rb tree key, since a negative residual count indicates that the cfqq overran its slice the last time. Hence we want to add the overrun time, to position it a bit further away in the service tree. Reported-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Makes the whole thing easier to read, cfq_dispatch_requests() was a bit messy before. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 05 10月, 2009 4 次提交
-
-
由 Jens Axboe 提交于
It was briefly introduced to allow CFQ to to delayed scheduling, but we ended up removing that feature again. So lets kill the function and export, and just switch CFQ back to the normal work schedule since it is now passing in a '0' delay from all call sites. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Corrado Zoccolo 提交于
The RR service tree is indexed by a key that is relative to current jiffies. This can cause problems on jiffies wraparound. The patch fixes it using time_before comparison, and changing the add_front path to use a relative number, too. Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
cfq uses rq->start_time as the fifo indicator, but that field may get modified prior to cfq doing it's fifo list adjustment when a request gets merged with another request. This can cause the fifo list to become unordered. Reported-by: NCorrado Zoccolo <czoccolo@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We cannot delay for the first dispatch of the async queue if it hasn't dispatched at all, since that could present a local user DoS attack vector using an app that just did slow timed sync reads while filling memory. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 04 10月, 2009 2 次提交
-
-
由 Jens Axboe 提交于
We should use the sysfs modified slice sync value, in case it differs from the default. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Don't think that's necessarily a perfect description of what this option fiddles with, but it's probably better than 'desktop'. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 03 10月, 2009 3 次提交
-
-
由 Jens Axboe 提交于
This slowly ramps up the async queue depth based on the time passed since the sync IO, and doesn't allow async at all until a sync slice period has passed. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o Do not allow more than max_dispatch requests from an async queue, if some sync request has finished recently. This is in the hope that sync activity is still going on in the system and we might receive a sync request soon. Most likely from a sync queue which finished a request and we did not enable idling on it. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This is basically identical to what Vivek Goyal posted, but combined into one and labelled 'desktop' instead of 'fairness'. The goal is to continue to improve on the latency side of things as it relates to interactiveness, keeping the questionable bits under this sysfs tunable so it would be easy for throughput-only people to turn off. Apart from adding the interactive sysfs knob, it also adds the behavioural change of allowing slice idling even if the hardware does tagged command queuing. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 14 9月, 2009 1 次提交
-
-
由 Jeff Moyer 提交于
This patch addresses http://bugzilla.kernel.org/show_bug.cgi?id=13401, a regression introduced in 2.6.30. From the bug report: Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 9月, 2009 5 次提交
-
-
由 Shan Wei 提交于
The blktrace tools can show process id when cfq dispatched a request, using cfq_log_cfqq() instead of cfq_log(). Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
It's not currently used, as pointed out by Gui Jianfeng <guijianfeng@cn.fujitsu.com>. We already check the wait_request flag to allow an idling queue priority allocation access, so we don't need this extra flag. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Vivek Goyal 提交于
o Get rid of busy_rt_queues infrastructure. Looks like it is redundant. o Once an RT queue gets request it will preempt any of the BE or IDLE queues immediately. Otherwise this queue will be put on service tree and scheduler will anyway select this queue before any of the BE or IDLE queue. Hence looks like there is no need to keep track of how many busy RT queues are currently on service tree. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
To lessen the impact of async IO on sync IO, let the device drain of any async IO in progress when switching to a sync cfqq that has idling enabled. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 7月, 2009 1 次提交
-
-
由 Vivek Goyal 提交于
In case memory is scarce, we now default to oom_cfqq. Once memory is available again, we should allocate a new cfqq and stop using oom_cfqq for a particular io context. Once a new request comes in, check if we are using oom_cfqq, and if yes, try to allocate a new cfqq. Tested the patch by forcing the use of oom_cfqq and upon next request thread realized that it was using oom_cfqq and it allocated a new cfqq. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 01 7月, 2009 3 次提交
-
-
由 Shan Wei 提交于
With the changes for falling back to an oom_cfqq, we never fail to find/allocate a queue in cfq_get_queue(). So remove the check. Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Setup an emergency fallback cfqq that we allocate at IO scheduler init time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue() never fails without having to ensure free memory. On cfqq lookup, always try to allocate a new cfqq if the given cfq io context has the oom_cfqq assigned. This ensures that we only temporarily punt to this shared queue. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We're going to be needing that init code outside of that function to get rid of the __GFP_NOFAIL in cfqq allocation. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 24 6月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>
-
- 16 6月, 2009 2 次提交
-
-
由 Jeff Moyer 提交于
I noticed a blank line in blktrace output. This patch fixes that. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Gui Jianfeng 提交于
Actually, last_end_request in cfq_data isn't used now. So lets just remove it. Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 6月, 2009 1 次提交
-
-
由 Nikanth Karthikesan 提交于
Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 5月, 2009 3 次提交
-
-
由 Tejun Heo 提交于
struct request has had a few different ways to represent some properties of a request. ->hard_* represent block layer's view of the request progress (completion cursor) and the ones without the prefix are supposed to represent the issue cursor and allowed to be updated as necessary by the low level drivers. The thing is that as block layer supports partial completion, the two cursors really aren't necessary and only cause confusion. In addition, manual management of request detail from low level drivers is cumbersome and error-prone at the very least. Another interesting duplicate fields are rq->[hard_]nr_sectors and rq->{hard_cur|current}_nr_sectors against rq->data_len and rq->bio->bi_size. This is more convoluted than the hard_ case. rq->[hard_]nr_sectors are initialized for requests with bio but blk_rq_bytes() uses it only for !pc requests. rq->data_len is initialized for all request but blk_rq_bytes() uses it only for pc requests. This causes good amount of confusion throughout block layer and its drivers and determining the request length has been a bit of black magic which may or may not work depending on circumstances and what the specific LLD is actually doing. rq->{hard_cur|current}_nr_sectors represent the number of sectors in the contiguous data area at the front. This is mainly used by drivers which transfers data by walking request segment-by-segment. This value always equals rq->bio->bi_size >> 9. However, data length for pc requests may not be multiple of 512 bytes and using this field becomes a bit confusing. In general, having multiple fields to represent the same property leads only to confusion and subtle bugs. With recent block low level driver cleanups, no driver is accessing or manipulating these duplicate fields directly. Drop all the duplicates. Now rq->sector means the current sector, rq->data_len the current total length and rq->bio->bi_size the current segment length. Everything else is defined in terms of these three and available only through accessors. * blk_recalc_rq_sectors() is collapsed into blk_update_request() and now handles pc and fs requests equally other than rq->sector update. This means that now pc requests can use partial completion too (no in-kernel user yet tho). * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer now uses byte count as the primary data length. * blk_rq_pos() is now guranteed to be always correct. In-block users converted. * blk_rq_bytes() is now guaranteed to be always valid as is blk_rq_sectors(). In-block users converted. * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9. More convenient one is used. * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const pointer to request. [ Impact: API cleanup, single way to represent one property of a request ] Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NGrant Likely <grant.likely@secretlab.ca> Tested-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: NMike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
Implement accessors - blk_rq_pos(), blk_rq_sectors() and blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors and rq->hard_cur_sectors respectively and convert direct references of the said fields to the accessors. This is in preparation of request data length handling cleanup. Geert : suggested adding const to struct request * parameter to accessors Sergei : spotted error in patch description [ Impact: cleanup ] Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: NStephen Rothwell <sfr@canb.auug.org.au> Tested-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NGrant Likely <grant.likely@secretlab.ca> Ackec-by: NSergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 28 4月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
blk_start_queueing() is identical to __blk_run_queue() except that it doesn't check for recursion. None of the current users depends on blk_start_queueing() running request_fn directly. Replace usages of blk_start_queueing() with [__]blk_run_queue() and kill it. [ Impact: removal of mostly duplicate interface function ] Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 24 4月, 2009 3 次提交
-
-
由 Jens Axboe 提交于
Currently we look it up from ->ioprio, but ->ioprio can change if either the process gets its IO priority changed explicitly, or if cfq decides to temporarily boost it. So if we are unlucky, we can end up attempting to remove a node from a different rbtree root than where it was added. Fix this by using ->org_ioprio as the prio_tree index, since that will only change for explicit IO priority settings (not for a boost). Additionally cache the rbtree root inside the cfqq, then we don't have to add code to reinsert the cfqq in the prio_tree if IO priority changes. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
cfq_prio_tree_lookup() should return the direct match, yet it always returns zero. Fix that. cfq_prio_tree_add() assumes that we don't get a direct match, while it is very possible that we do. Using O_DIRECT, you can have different cfqq with matching requests, since you don't have the page cache to serialize things for you. Fix this bug by only adding the cfqq if there isn't an existing match. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Not strictly needed, but we should make it clear that we init the rbtree roots here. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 22 4月, 2009 2 次提交
-
-
由 Jeff Moyer 提交于
If the cfq io context doesn't have enough samples yet to provide a mean seek distance, then use the default threshold we have for seeky IO instead of defaulting to 0. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jeff Moyer 提交于
Right now, depending on the first sector to which a process issues I/O, the seek time may start out way out of whack. So make sure we start with 0 sectors in seek, instead of the offset of the first request issued. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 15 4月, 2009 4 次提交
-
-
由 Jens Axboe 提交于
If we have processes that are working in close proximity to each other on disk, we don't want to idle wait. Instead allow the close process to issue a request, getting better aggregate bandwidth. The anticipatory scheduler has similar checks, noop and deadline do not need it since they don't care about process <-> io mappings. The code for CFQ is a little more involved though, since we split request queues into per-process contexts. This fixes a performance problem with eg dump(8), since it uses several processes in some silly attempt to speed IO up. Even if dump(8) isn't really a valid case (it should be fixed by using CLONE_IO), there are other cases where we see close processes and where idling ends up hurting performance. Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the initial implementation. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Makes it easier to read the traces. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We only kick the dispatch for an idling queue, if we think it's a (somewhat) fully merged request. Also allow a kick if we have other busy queues in the system, since we don't want to risk waiting for a potential merge in that case. It's better to get some work done and proceed. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
It's called from the workqueue handlers from process context, so we always have irqs enabled when entered. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-