- 12 5月, 2009 1 次提交
-
-
由 Kazuhisa Ichikawa 提交于
Current bio_vec array index out-of-bounds test within __end_that_request_first() does not seem correct. It checks bio->bi_idx against bio->bi_vcnt, but the subsequent code uses idx (which is, bio->bi_idx + next_idx) as the array index into bio_vec array. This means that the test really make sense only at the first iteration of !(nr_bytes >=bio->bi_size) case (when next_idx == zero). Fix this by replacing bio->bi_idx with idx. (This patch applies to 2.6.30-rc4.) Signed-off-by: NKazuhisa Ichikawa <ki@epsilou.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 06 5月, 2009 1 次提交
-
-
由 Alan D. Brunelle 提交于
Remove redundant from-sector parameter: it's /always/ the bio's sector passed in. [ Impact: cleanup ] Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <49FF517C.7000503@hp.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 4月, 2009 5 次提交
-
-
由 Jens Axboe 提交于
Currently we look it up from ->ioprio, but ->ioprio can change if either the process gets its IO priority changed explicitly, or if cfq decides to temporarily boost it. So if we are unlucky, we can end up attempting to remove a node from a different rbtree root than where it was added. Fix this by using ->org_ioprio as the prio_tree index, since that will only change for explicit IO priority settings (not for a boost). Additionally cache the rbtree root inside the cfqq, then we don't have to add code to reinsert the cfqq in the prio_tree if IO priority changes. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
cfq_prio_tree_lookup() should return the direct match, yet it always returns zero. Fix that. cfq_prio_tree_add() assumes that we don't get a direct match, while it is very possible that we do. Using O_DIRECT, you can have different cfqq with matching requests, since you don't have the page cache to serialize things for you. Fix this bug by only adding the cfqq if there isn't an existing match. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Not strictly needed, but we should make it clear that we init the rbtree roots here. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Hannes Reinecke 提交于
Very rarely under stress testing of dm, oopses are occuring as something tampers with an old stack frame. This has been traced back to blk_abort_queue() leaving a timeout_list pointing to the stack. The reason is that sometimes blk_abort_request() won't delete the timer (if the request is marked as complete but before the timer has been removed, a small race window). Fix this by splicing back from the ususally empty list to the q->timeout_list. Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jerome Marchand 提交于
This simplifies I/O stat accounting switching code and separates it completely from I/O scheduler switch code. Requests are accounted according to the state of their request queue at the time of the request allocation. There is no need anymore to flush the request queue when switching I/O accounting state. Signed-off-by: NJerome Marchand <jmarchan@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 22 4月, 2009 6 次提交
-
-
由 Jeff Moyer 提交于
If the cfq io context doesn't have enough samples yet to provide a mean seek distance, then use the default threshold we have for seeky IO instead of defaulting to 0. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jeff Moyer 提交于
Right now, depending on the first sector to which a process issues I/O, the seek time may start out way out of whack. So make sure we start with 0 sectors in seek, instead of the offset of the first request issued. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
There's nothing to do for those devices, since the timeout handling is based on requests. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
/proc/diskstats used to show stats for all disks whether they're zero-sized or not and their non-zero partitions. Commit 074a7aca accidentally changed the behavior such that it doesn't print out zero sized disks. This patch implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and uses it in diskstats_show() such that empty part0 is shown in /proc/diskstats. Reported and bisectd by Dianel Collins. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NDaniel Collins <solemnwarning@solemnwarning.no-ip.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily All DMA address limits are expressed in terms of the last addressable unit (byte or page) instead of one plus that. However, when determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(), it compares the specified limit against 0x100000000UL to determine whether it's below 4G ending up falsely setting GFP_DMA in q->bounce_gfp. As DMA zone is very small on x86_64, this makes larger SG_IO transfers very eager to trigger OOM killer. Fix it. While at it, rename the parameter to @dma_mask for clarity and convert comment to proper winged style. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
Impact: fix SG_IO behavior such that it matches the documentation SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the shorter one wins. However, the current implementation returns -EINVAL for such cases. Trim iovc if it's longer than ->dxfer_len. This patch uses iov_*() helpers which take struct iovec * by casting struct sg_iovec * to it. sg_iovec is always identical to iovec and this will be further cleaned up with later patches. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 16 4月, 2009 2 次提交
-
-
由 Li Zefan 提交于
Impact: allow ftrace-plugin blktrace to trace device-mapper devices To trace a single partition: # echo 1 > /sys/block/sda/sda1/enable To trace the whole sda instead: # echo 1 > /sys/block/sda/enable Thus we also fix an issue reported by Ted, that ftrace-plugin blktrace can't be used to trace device-mapper devices. Now: # echo 1 > /sys/block/dm-0/trace/enable echo: write error: No such device or address # mount -t ext4 /dev/dm-0 /mnt # echo 1 > /sys/block/dm-0/trace/enable # echo blk > /debug/tracing/current_tracer Reported-by: NTheodore Tso <tytso@mit.edu> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Shawn Du <duyuyang@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> LKML-Reference: <49E42665.6020506@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Shawn Du 提交于
Though one can specify '-d /dev/sda1' when using blktrace, it still traces the whole sda. To support per-partition tracing, when we start tracing, we initialize bt->start_lba and bt->end_lba to the start and end sector of that partition. Note some actions are per device, thus we don't filter 0-sector events. The original patch and discussion can be found here: http://marc.info/?l=linux-btrace&m=122949374214540&w=2Signed-off-by: NShawn Du <duyuyang@gmail.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> LKML-Reference: <49E42620.4050701@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 4月, 2009 11 次提交
-
-
由 Jens Axboe 提交于
If we have processes that are working in close proximity to each other on disk, we don't want to idle wait. Instead allow the close process to issue a request, getting better aggregate bandwidth. The anticipatory scheduler has similar checks, noop and deadline do not need it since they don't care about process <-> io mappings. The code for CFQ is a little more involved though, since we split request queues into per-process contexts. This fixes a performance problem with eg dump(8), since it uses several processes in some silly attempt to speed IO up. Even if dump(8) isn't really a valid case (it should be fixed by using CLONE_IO), there are other cases where we see close processes and where idling ends up hurting performance. Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the initial implementation. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Makes it easier to read the traces. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We only kick the dispatch for an idling queue, if we think it's a (somewhat) fully merged request. Also allow a kick if we have other busy queues in the system, since we don't want to risk waiting for a potential merge in that case. It's better to get some work done and proceed. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
It's called from the workqueue handlers from process context, so we always have irqs enabled when entered. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nikanth Karthikesan 提交于
Remove code handling bio_alloc failure with __GFP_WAIT. GFP_KERNEL implies __GFP_WAIT. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 FUJITA Tomonori 提交于
blk_rq_unmap_user() returns -EFAULT if a program passes an invalid address to kernel. SG_IO path needs to pass the returned value to user space instead of ignoring it. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit b029195d introduced a regression of about 50% with sequential threaded read workloads. The test case is: tiotest -k0 -k1 -k3 -f 80 -t 32 which starts 32 threads each reading a 80MB file. Twiddle the kick queue logic so that we do start IO immediately, if it appears to be a fully merged request. We can't really detect that, so just check if the request is bigger than a page or not. The assumption is that since single bio issues will first queue a single request with just one page attached and then later do merges on that, if we already have more than a page worth of data in the request, then the request is most likely good to go. Verified that this doesn't cause a regression with the test case that commit b029195d was fixing. It does not, we still see maximum sized requests for the queue-then-merge cases. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We can just use the block layer BLK_RW_SYNC/ASYNC defines now. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We can just use the block layer BLK_RW_SYNC/ASYNC defines now. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Credit goes to Andrew Morton for spotting this one. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 07 4月, 2009 7 次提交
-
-
由 Jens Axboe 提交于
When CFQ is waiting for a new request from a process, currently it'll immediately restart queuing when it sees such a request. This doesn't work very well with streamed IO, since we then end up splitting IO that would otherwise have been merged nicely. For a simple dd test, this causes 10x as many requests to be issued as we should have. Normally this goes unnoticed due to the low overhead of requests at the device side, but some hardware is very sensitive to request sizes and there it can cause big slow downs. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
The request inherits the unplug flag from the bio, but it isn't actually used. The bio flag stops at __make_request(), which tells it to unplug after submission. Passing it on to the request doesn't make any sense. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We only manipulate the must_dispatch and queue_new flags, they are not tested anymore. So get rid of them. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
The IO scheduler core calls into the IO scheduler dispatch_request hook to move requests from the IO scheduler and into the driver dispatch list. It only does so when the dispatch list is empty. CFQ moves several requests to the dispatch list, which can cause higher latencies if we suddenly have to switch to some important sync IO. Change the logic to move one request at the time instead. This should almost be functionally equivalent to what we did before, except that we now honor 'quantum' as the maximum queue depth at the device side from any single cfqq. If there's just a single active cfqq, we allow up to 4 times the normal quantum. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jerome Marchand 提交于
This forces in_flight to be zero when turning off or on the I/O stat accounting and stops updating I/O stats in attempt_merge() when accounting is turned off. Signed-off-by: NJerome Marchand <jmarchan@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Simple helper functions to quiesce the request queue. These are currently only used for switching IO schedulers on-the-fly, but we can use them to properly switch IO accounting on and off as well. Signed-off-by: NJerome Marchand <jmarchan@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Alan Cox 提交于
Fix a typo (this was in the original patch but was not merged when the code fixes were for some reason) Signed-off-by: NAlan Cox <alan@redhat.com> Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
- 06 4月, 2009 3 次提交
-
-
由 Jens Axboe 提交于
By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jens Axboe 提交于
For the older SSD devices that don't do command queuing, we do want to enable plugging to get better merging. Signed-off-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jens Axboe 提交于
This makes sure that we never wait on async IO for sync requests, instead of doing the split on writes vs reads. Signed-off-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2009 1 次提交
-
-
由 Li Zefan 提交于
Impact: output all of packet commands - not just the first 4 / 8 bytes Since commit d7e3c324 ("block: add large command support"), struct request->cmd has been changed from unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd. v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> - make sure rq->cmd_len is always intialized, and then we can use rq->cmd_len instead of BLK_MAX_CDB. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> LKML-Reference: <49D4507E.2060602@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 3月, 2009 2 次提交
-
-
由 Boaz Harrosh 提交于
bsg submits REQ_TYPE_BLOCK_PC so the right check is max_hw_sectors. But I've removed this check because right after, bsg proceeds with calling blk_rq_map_user() which does all the right checks. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Boaz Harrosh 提交于
Put a WARN_ON in __blk_put_request if it is about to leak bio(s). This is a serious bug that can happen in error handling code paths. For this to work I have fixed a couple of places in block/ where request->bio != NULL ownership was not honored. And a small cleanup at sg_io() while at it. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 24 3月, 2009 1 次提交
-
-
由 Boaz Harrosh 提交于
Currently inherited from sg.c bsg will submit asynchronous request at the head-of-the-queue, (using "at_head" set in the call to blk_execute_rq_nowait()). This is bad in situation where the queues are full, requests will execute out of order, and can cause starvation of the first submitted requests. The sg_io_v4->flags member is used and a bit is allocated to denote the Q_AT_TAIL. Zero is to queue at_head as before, to be compatible with old code at the write/read path. SG_IO code path behavior was changed so to be the same as write/read behavior. SG_IO was very rarely used and breaking compatibility with it is OK at this stage. sg_io_hdr at sg.h also has a flags member and uses 3 bits from the first nibble and one bit from the last nibble. Even though none of these bits are supported by bsg, The second nibble is allocated for use by bsg. Just in case. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> CC: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-