- 09 10月, 2008 3 次提交
-
-
由 Jens Axboe 提交于
We really need to know about the hardware tagging support as well, since if the SSD does not do tagging then we still want to idle. Otherwise have the same dependent sync IO vs flooding async IO problem as on rotational media. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Preparatory patch for checking queuing affinity. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 27 7月, 2008 1 次提交
-
-
由 Arjan van de Ven 提交于
Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 7月, 2008 1 次提交
-
-
由 Jens Axboe 提交于
If we have multiple tasks freeing io contexts when as-iosched is being unloaded, we could complete() ioc_gone twice. Fix that by protecting ioc_gone complete() and clearing with a spinlock for just that purpose. Doesn't matter from a performance perspective, since it'll only enter that path when ioc_gone != NULL (when as-iosched is being rmmod'ed). Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 01 7月, 2008 1 次提交
-
-
由 Divyesh Shah 提交于
AS scheduler alternates between issuing read and write batches. It does the batch switch only after all requests from the previous batch are completed. When switching to a write batch, if there is an on-going read request, it waits for its completion and indicates its intention of switching by setting ad->changed_batch and the new direction but does not update the batch_expire_time for the new write batch which it does in the case of no previous pending requests. On completion of the read request, it sees that we were waiting for the switch and schedules work for kblockd right away and resets the ad->changed_data flag. Now when kblockd enters dispatch_request where it is expected to pick up a write request, it in turn ends the write batch because the batch_expire_timer was not updated and shows the expire timestamp for the previous batch. This results in the write starvation for all the cases where there is the intention for switching to a write batch, but there is a previous in-flight read request and the batch gets reverted to a read_batch right away. This also holds true in the reverse case (switching from a write batch to a read batch with an in-flight write request). I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and linux-2.6-block git HEAD. I've tested the fix on x86 platforms with SCSI drives where the driver asks for the next request while a current request is in-flight. This patch is based off linux-2.6-block git HEAD. Bug reproduction: A simple scenario which reproduces this bug is: - dd if=/dev/hda3 of=/dev/null & - lilo The lilo takes forever to complete. This can also be reproduced fairly easily with the earlier dd and another test program doing msync(). The example test program below should print out a message after every iteration but it simply hangs forever. With this bugfix it makes forward progress. ==== Example test program using msync() (thanks to suleiman AT google DOT com) inline uint64_t rdtsc(void) { int64_t tsc; __asm __volatile("rdtsc" : "=A" (tsc)); return (tsc); } int main(int argc, char **argv) { struct stat st; uint64_t e, s, t; char *p, q; long i; int fd; if (argc < 2) { printf("Usage: %s <file>\n", argv[0]); return (1); } if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0) err(1, "open"); if (fstat(fd, &st) < 0) err(1, "fstat"); p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); t = 0; for (i = 0; i < 1000; i++) { *p = 0; msync(p, 4096, MS_SYNC); s = rdtsc(); *p = 0; __asm __volatile(""::: "memory"); e = rdtsc(); if (argc > 2) printf("%d: %lld cycles %jd %jd\n", i, e - s, (intmax_t)s, (intmax_t)e); t += e - s; } printf("average time: %lld cycles\n", t / 1000); return (0); } Cc: <stable@kernel.org> Acked-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 01 2月, 2008 2 次提交
-
-
由 Jens Axboe 提交于
It blindly copies everything in the io_context, including the lock. That doesn't work so well for either lock ordering or lockdep. There seems zero point in swapping io contexts on a request to request merge, so the best point of action is to just remove it. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Since it's acquired from irq context, all locking must be of the irq safe variant. Most are already inside the queue lock (which already disables interrupts), but the io scheduler rmmod path always has irqs enabled and the put_io_context() path may legally be called with irqs enabled (even if it isn't usually). So fixup those two. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 30 1月, 2008 1 次提交
-
-
由 Jens Axboe 提交于
If the two requests belong to the same io context, we will attempt to lock the same lock twice. But swapping contexts is pointless in that case, so just check for rioc == nioc before doing the double lock and copy. Tested-by: NOlof Johansson <olof@lixom.net> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 28 1月, 2008 1 次提交
-
-
由 Jens Axboe 提交于
changes to anticipatory io scheduler for io_context sharing Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 18 12月, 2007 3 次提交
-
-
由 Adrian Bunk 提交于
elv_register() always returns 0, and there isn't anything it does where it should return an error (the only error condition is so grave that it's handled with a BUG_ON). Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Aaron Carroll 提交于
New write batches currently start from where the last one completed. We have no idea where the head is after switching batches, so this makes little sense. Instead, start the next batch from the request with the earliest deadline in the hope that we avoid a deadline expiry later on. Signed-off-by: NAaron Carroll <aaronc@gelato.unsw.edu.au> Acked-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Aaron Carroll 提交于
Two comments refer to deadlines applying to reads only. This is not the case. Signed-off-by: NAaron Carroll <aaronc@gelato.unsw.edu.au> Acked-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 24 7月, 2007 1 次提交
-
-
由 Jens Axboe 提交于
Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 18 7月, 2007 1 次提交
-
-
由 Christoph Lameter 提交于
kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing variant in the past. But with __GFP_ZERO it is possible now to do zeroing while allocating. Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever we can. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 5月, 2007 1 次提交
-
-
由 Andrew Morton 提交于
Switch the kblockd flushing from a global flush to a more specific flush_work(). (akpm: bypassed maintainers, sorry. There are other patches which depend on this) Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <axboe@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 5月, 2007 1 次提交
-
-
由 Nick Piggin 提交于
Fix units mismatch (jiffies vs msecs) in as-iosched.c, spotted by Xiaoning Ding <dingxn@cse.ohio-state.edu>. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
We implemented the missing bits to allow this some time ago, and they are integrated in AS. So remove the __module_get() to allow the module to be unloaded. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 01 12月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
- ->init_queue() does not need the elevator passed in - ->put_request() is a hot path and need not have the queue passed in - cfq_update_io_seektime() does not need cfqd passed in Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 22 11月, 2006 1 次提交
-
-
由 David Howells 提交于
Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: NDavid Howells <dhowells@redhat.com>
-
- 01 10月, 2006 13 次提交
-
-
由 Peter Zijlstra 提交于
All on stack DECLARE_COMPLETIONs should be replaced by: DECLARE_COMPLETION_ONSTACK Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jens Axboe 提交于
As people often look for the copyright in files to see who to mail, update the link to a neutral one. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
CFQ implements this on its own now, but it's really block layer knowledge. Tells a device queue to start dispatching requests to the driver, taking care to unplug if needed. Also fixes the issue where as/cfq will invoke a stopped queue, which we really don't want. Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
Some were kmalloc_node(), some were still kmalloc(). Change them all to kmalloc_node(). Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
It's ok if the read path is a lot more costly, as long as inc/dec is really cheap. The inc/dec will happen for each created/freed io context, while the reading only happens when a disk queue exits. Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
It's not needed for anything, so kill the bio passing. Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
Get rid of the as_rq request type. With the added elevator_private2, we have enough room in struct request to get rid of any arq allocation/free for each request. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NNick Piggin <npiggin@suse.de>
-
由 Jens Axboe 提交于
We can track this in struct request. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NNick Piggin <npiggin@suse.de>
-
由 Jens Axboe 提交于
Saves some space in arq. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NNick Piggin <npiggin@suse.de>
-
由 Jens Axboe 提交于
This removes the rbtree handling from AS. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NNick Piggin <npiggin@suse.de>
-
由 Jens Axboe 提交于
The conditions got reserved. Also make rb_next() and rb_prev() check for the empty condition. Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: NJens Axboe <axboe@suse.de>
-
- 01 7月, 2006 1 次提交
-
-
由 Jörn Engel 提交于
Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: NAdrian Bunk <bunk@stusta.de>
-
- 27 6月, 2006 1 次提交
-
-
由 Andreas Mohr 提交于
acquired (aquired) contiguous (contigious) successful (succesful, succesfull) surprise (suprise) whether (weather) some other misspellings Signed-off-by: NAndreas Mohr <andi@lisas.de> Signed-off-by: NAdrian Bunk <bunk@stusta.de>
-
- 23 6月, 2006 3 次提交
-
-
由 Jens Axboe 提交于
They all duplicate macros to check for empty root and/or node, and clearing a node. So put those in rbtree.h. Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Jens Axboe 提交于
A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: NJens Axboe <axboe@suse.de>
-
由 Akinobu Mita 提交于
Use hlist instead of list_head for request hashtable in deadline-iosched and as-iosched. It also can remove the flag to know hashed or unhashed. Signed-off-by: NAkinobu Mita <mita@miraclelinux.com> Signed-off-by: NJens Axboe <axboe@suse.de> block/as-iosched.c | 45 +++++++++++++++++++-------------------------- block/deadline-iosched.c | 39 ++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 49 deletions(-)
-
- 09 6月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
There's a race between shutting down one io scheduler and firing up the next, in which a new io could enter and cause the io scheduler to be invoked with bad or NULL data. To fix this, we need to maintain the queue lock for a bit longer. Unfortunately we cannot do that, since the elevator init requires to be run without the lock held. This isn't easily fixable, without also changing the mempool API. So split the initialization into two parts, and alloc-init operation and an attach operation. Then we can preallocate the io scheduler and related structures, and run the attach inside the lock after we detach the old one. This patch has survived 30 minutes of 1 second io scheduler switching with a very busy io load. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 21 4月, 2006 1 次提交
-
-
由 David Woodhouse 提交于
They were abusing the rb_color field to mark nodes which weren't currently on the tree. Fix that to use the same method as eventpoll did -- setting the parent pointer to point back to itself. And use the appropriate accessor macros for setting and reading the parent. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-