- 14 1月, 2011 1 次提交
-
-
由 NeilBrown 提交于
commit 589a594b (2.6.37-rc4) fixed a problem were md_thread would sometimes call the ->run function at a bad time. If an error is detected during array start up after the md_thread has been started, the md_thread is killed. This resulted in the ->run function being called once. However the array may not be in a state that it is safe to call ->run. However the fix imposed meant that ->run was not called on a timeout. This means that when an array goes idle, bitmap bits do not get cleared promptly. While the array is busy the bits will still be cleared when appropriate so this is not very serious. There is no risk to data. Change the test so that we only avoid calling ->run when the thread is being stopped. This more explicitly addresses the problem situation. This is suitable for 2.6.37-stable and any -stable kernel to which 589a594b was applied. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 1月, 2011 1 次提交
-
-
由 NeilBrown 提交于
Commit 1a855a06 (2.6.37-rc4) fixed a problem where devices were re-added when they shouldn't be but caused a regression in a less common case that means sometimes devices cannot be re-added when they should be. In particular, when re-adding a device to an array without metadata we should always access the device, but after the above commit we didn't. This patch sets the In_sync flag in that case so that the re-add succeeds. This patch is suitable for any -stable kernel to which 1a855a06 was applied. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 17 12月, 2010 2 次提交
-
-
由 Mike Snitzer 提交于
Implement blk_limits_max_hw_sectors() and make blk_queue_max_hw_sectors() a wrapper around it. DM needs this to avoid setting queue_limits' max_hw_sectors and max_sectors directly. dm_set_device_limits() now leverages blk_limits_max_hw_sectors() logic to establish the appropriate max_hw_sectors minimum (PAGE_SIZE). Fixes issue where DM was incorrectly setting max_sectors rather than max_hw_sectors (which caused dm_merge_bvec()'s max_hw_sectors check to be ineffective). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Martin K. Petersen 提交于
When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: NEd Lin <ed.lin@promise.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 09 12月, 2010 5 次提交
-
-
由 NeilBrown 提交于
When we fail to start a raid10 for some reason, we call md_unregister_thread to kill the thread that was created. Unfortunately md_thread() will then make one call into the handler (raid10d) even though md_wakeup_thread has not been called. This is not safe and as md_unregister_thread is called after mddev->private has been set to NULL, it will definitely cause a NULL dereference. So fix this at both ends: - md_thread should only call the handler if THREAD_WAKEUP has been set. - raid10 should call md_unregister_thread before setting things to NULL just like all the other raid modules do. This is applicable to 2.6.35 and later. Cc: stable@kernel.org Reported-by: N"Citizen" <citizen_lee@thecus.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
With v0.90 metadata, a hot-spare does not become a full member of the array until recovery is complete. So if we re-add such a device to the array, we know that all of it is as up-to-date as the event count would suggest, and so it a bitmap-based recovery is possible. However with v1.x metadata, the hot-spare immediately becomes a full member of the array, but it record how much of the device has been recovered. If the array is stopped and re-assembled recovery starts from this point. When such a device is hot-added to an array we currently lose the 'how much is recovered' information and incorrectly included it as a full in-sync member (after bitmap-based fixup). This is wrong and unsafe and could corrupt data. So be more careful about setting saved_raid_disk - which is what guides the re-adding of devices back into an array. The new code matches the code in slot_store which does a similar thing, which is encouraging. This is suitable for any -stable kernel. Reported-by: N"Dailey, Nate" <Nate.Dailey@stratus.com> Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As recorded in https://bugzilla.kernel.org/show_bug.cgi?id=24012 it is possible for a flush request through md to hang. This is due to an interaction between the recursion avoidance in generic_make_request, the insistence in md of only having one flush active at a time, and the possibility of dm (or md) submitting two flush requests to a device from the one generic_make_request. If a generic_make_request call into dm causes two flush requests to be queued (as happens if the dm table has two targets - they get one each), these two will be queued inside generic_make_request. Assume they are for the same md device. The first is processed and causes 1 or more flush requests to be sent to lower devices. These get queued within generic_make_request too. Then the second flush to the md device gets handled and it blocks waiting for the first flush to complete. But it won't complete until the two lower-device requests complete, and they haven't even been submitted yet as they are on the generic_make_request queue. The deadlock can be broken by using a separate thread to submit the requests to lower devices. md has such a thread readily available: md_wq. So use it to submit these requests. Reported-by: NGiacomo Catenazzi <cate@cateee.net> Tested-by: NGiacomo Catenazzi <cate@cateee.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
submit_flushes is called from exactly one place. Move the code that is before and after that call into submit_flushes. This has not functional change, but will make the next patch smaller and easier to follow. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
None of the functions called between setting flush_pending to 1, and atomic_dec_and_test can change flush_pending, or will anything running in any other thread (as ->flush_bio is not NULL). So the atomic_dec_and_test will always succeed. So remove the atomic_sec and the atomic_dec_and_test. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 24 11月, 2010 3 次提交
-
-
由 Darrick J. Wong 提交于
Before 2.6.37, the md layer had a mechanism for catching I/Os with the barrier flag set, and translating the barrier into barriers for all the underlying devices. With 2.6.37, I/O barriers have become plain old flushes, and the md code was updated to reflect this. However, one piece was left out -- the md layer does not tell the block layer that it supports flushes or FUA access at all, which results in md silently dropping flush requests. Since the support already seems there, just add this one piece of bookkeeping. Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Commit 4044ba58 supposedly fixed a problem where if a raid1 with just one good device gets a read-error during recovery, the recovery would abort and immediately restart in an infinite loop. However it depended on raid1_remove_disk removing the spare device from the array. But that does not happen in this case. So add a test so that in the 'recovery_disabled' case, the device will be removed. This suitable for any kernel since 2.6.29 which is when recovery_disabled was introduced. Cc: stable@kernel.org Reported-by: NSebastian Färber <faerber@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Justin Maggard 提交于
When trying to grow an array by enlarging component devices, rdev_size_store() expects the return value of rdev_size_change() to be in sectors, but the actual value is returned in KBs. This functionality was broken by commit dd8ac336 so this patch is suitable for any kernel since 2.6.30. Cc: stable@kernel.org Signed-off-by: NJustin Maggard <jmaggard10@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 10 11月, 2010 1 次提交
-
-
由 Mike Snitzer 提交于
Convert direct reads of an inode's i_size to using i_size_read(). i_size_{read,write} use a seqcount to protect reads from accessing incomple writes. Concurrent i_size_write()s require mutual exclussion to protect the seqcount that is used by i_size_{read,write}. But i_size_read() callers do not need to use additional locking. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NNeilBrown <neilb@suse.de> Acked-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 29 10月, 2010 4 次提交
-
-
由 NeilBrown 提交于
The code for searching through the device list to read-balance in raid1 is rather clumsy and hard to follow. Try to simplify it a bit. No important functionality change here. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This structure field (flushing_bio_list) is never used, so remove it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When writing to an 'external' bitmap we don't currently unplug the device before waiting, so we can get a 3msec delay each time; So use REQ_UNPLUG to force and unplug. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 10月, 2010 9 次提交
-
-
由 NeilBrown 提交于
bio_clone and bio_alloc allocate from a common bio pool. If an md device is stacked with other devices that use this pool, or under something like swap which uses the pool, then the multiple calls on the pool can cause deadlocks. So allocate a local bio pool for each md array and use that rather than the common pool. This pool is used both for regular IO and metadata updates. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Currently sync_page_io takes a 'bdev'. Every caller passes 'rdev->bdev'. We will soon want another field out of the rdev in sync_page_io, So just pass the rdev instead of the bdev out of it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Though this mem alloc is GFP_NOIO an so will not deadlock, it seems better to do the allocation before 'raise_barrier' which stops any IO requests while the resync proceeds. raid10 always uses this order, so it is at least consistent to do the same in raid1. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
bio_alloc can never fail (as it uses a mempool) but an block indefinitely, especially if the caller is holding a reference to a previously allocated bio. So these to places which both handle failure and hold multiple bios should not use bio_alloc, they should use bio_kmalloc. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It is not safe to allocate from a mempool while holding an item previously allocated from that mempool as that can deadlock when the mempool is close to exhaustion. So don't use a bio list to collect the bios to write to multiple devices in raid1 and raid10. Instead queue each bio as it becomes available so an unplug will activate all previously allocated bios and so a new bio has a chance of being allocated. This means we must set the 'remaining' count to '1' before submitting any requests, then when all are submitted, decrement 'remaining' and possible handle the write completion at that point. Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com> Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Tejun Heo 提交于
Workqueue usage in md has two problems. * Flush can be used during or depended upon by memory reclaim, but md uses the system workqueue for flush_work which may lead to deadlock. * md depends on flush_scheduled_work() to achieve exclusion against completion of removal of previous instances. flush_scheduled_work() may incur unexpected amount of delay and is scheduled to be removed. This patch adds two workqueues to md - md_wq and md_misc_wq. The former is guaranteed to make forward progress under memory pressure and serves flush_work. The latter serves as the flush domain for other works. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
bitmap_get_counter returns the number of sectors covered by the counter in a pass-by-reference variable. In some cases this can be very large, so make it a sector_t for safety. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
lock_kernel calls were recently pushed down into open/release functions. md doesn't need that protection. Then the BKL calls were change to md_mutex. We don't need those either. So remove it all. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
A RAID1 which has no persistent metadata, whether internal or external, will hang on the first write. This is caused by commit 070dc6dd In that case, MD_CHANGE_PENDING never gets cleared. So during md_update_sb, is neither persistent or external, clear MD_CHANGE_PENDING. This is suitable for 2.6.36-stable. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 27 10月, 2010 1 次提交
-
-
由 Andrew Morton 提交于
Silly though it is, completions and wait_queue_heads use foo_ONSTACK (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK, __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I guess workqueues should do the same thing. s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/ s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/ Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 10月, 2010 1 次提交
-
-
由 Arnd Bergmann 提交于
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
-
- 07 10月, 2010 3 次提交
-
-
由 Vasiliy Kulikov 提交于
Function read_sb_page may return ERR_PTR(...). Check for it. Signed-off-by: NVasiliy Kulikov <segooon@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When performing a resync we pre-allocate some bios and repeatedly use them. This requires us to re-initialise them each time. One field (bi_comp_cpu) and some flags weren't being initiaised reliably. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
bitmap_start_sync returns - via a pass-by-reference variable - the number of sectors before we need to check with the bitmap again. Since commit ef425673 this number can be substantially larger, 2^27 is a common value. Unfortunately it is an 'int' and so when raid1.c:sync_request shifts it 9 places to the left it becomes 0. This results in a zero-length read which the scsi layer justifiably complains about. This patch just removes the shift so the common case becomes safe with a trivially-correct patch. In the next merge window we will convert this 'int' to a 'sector_t' Reported-by: N"George Spelvin" <linux@horizon.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 05 10月, 2010 1 次提交
-
-
由 Arnd Bergmann 提交于
The block device drivers have all gained new lock_kernel calls from a recent pushdown, and some of the drivers were already using the BKL before. This turns the BKL into a set of per-driver mutexes. Still need to check whether this is safe to do. file=$1 name=$2 if grep -q lock_kernel ${file} ; then if grep -q 'include.*linux.mutex.h' ${file} ; then sed -i '/include.*<linux\/smp_lock.h>/d' ${file} else sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file} fi sed -i ${file} \ -e "/^#include.*linux.mutex.h/,$ { 1,/^\(static\|int\|long\)/ { /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex); } }" \ -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \ -e '/[ ]*cycle_kernel_lock();/d' else sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \ -e '/cycle_kernel_lock()/d' fi Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
- 17 9月, 2010 2 次提交
-
-
由 NeilBrown 提交于
If an array with 1.x metadata is assembled with the last disk missing, md doesn't properly record the fact that the disk was missing. This is unlikely to cause a real problem as the event count will be different to the count on the missing disk so it won't be included in the array. However it could still cause confusion. So make sure we clear all the relevant slots, not just the early ones. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Now that we depend on md_update_sb to clear variable bits in mddev->flags (rather than trying not to set them) it is important to always call md_update_sb when appropriate. md_check_recovery has this job but explicitly avoids it for ->external metadata arrays. This is not longer appropraite, or needed. However we do want to avoid taking the mddev lock if only MD_CHANGE_PENDING is set as that is not cleared by md_update_sb for external-metadata arrays. Reported-by: N"Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 9月, 2010 1 次提交
-
-
由 Martin K. Petersen 提交于
We have several users of min_not_zero, each of them using their own definition. Move the define to kernel.h. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>
-
- 10 9月, 2010 5 次提交
-
-
由 Mike Snitzer 提交于
Rename __clone_and_map_flush to __clone_and_map_empty_flush for added clarity. Simplify logic associated with REQ_FLUSH conditionals. Introduce a BUG_ON() and add a few more helpful comments to the code so that it is clear that all flushes are empty. Cleanup __split_and_process_bio() so that an empty flush isn't processed by a 'sector_count' focused while loop. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Kiyoshi Ueda 提交于
Now queue_io() is called from dec_pending(), which may be called with interrupts disabled, so queue_io() must not enable interrupts unconditionally and must save/restore the current interrupts status. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tejun Heo 提交于
Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA doesn't mandate any ordering against other bio's. This patch relaxes ordering around flushes. * A flush bio is no longer deferred to workqueue directly. It's processed like other bio's but __split_and_process_bio() uses md->flush_bio as the clone source. md->flush_bio is initialized to empty flush during md initialization and shared for all flushes. * As a flush bio now travels through the same execution path as other bio's, there's no need for dedicated error handling path either. It can use the same error handling path in dec_pending(). Dedicated error handling removed along with md->flush_error. * When dec_pending() detects that a flush has completed, it checks whether the original bio has data. If so, the bio is queued to the deferred list w/ REQ_FLUSH cleared; otherwise, it's completed. * As flush sequencing is handled in the usual issue/completion path, dm_wq_work() no longer needs to handle flushes differently. Now its only responsibility is re-issuing deferred bio's the same way as _dm_request() would. REQ_FLUSH handling logic including process_flush() is dropped. * There's no reason for queue_io() and dm_wq_work() write lock dm->io_lock. queue_io() now only uses md->deferred_lock and dm_wq_work() read locks dm->io_lock. * bio's no longer need to be queued on the deferred list while a flush is in progress making DMF_QUEUE_IO_TO_THREAD unncessary. Drop it. This avoids stalling the device during flushes and simplifies the implementation. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tejun Heo 提交于
This patch converts request-based dm to support the new REQ_FLUSH/FUA. The original request-based flush implementation depended on request_queue blocking other requests while a barrier sequence is in progress, which is no longer true for the new REQ_FLUSH/FUA. In general, request-based dm doesn't have infrastructure for cloning one source request to multiple targets, but the original flush implementation had a special mostly independent path which can issue flushes to multiple targets and sequence them. However, the capability isn't currently in use and adds a lot of complexity. Moreoever, it's unlikely to be useful in its current form as it doesn't make sense to be able to send out flushes to multiple targets when write requests can't be. This patch rips out special flush code path and deals handles REQ_FLUSH/FUA requests the same way as other requests. The only special treatment is that REQ_FLUSH requests use the block address 0 when finding target, which is enough for now. * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as suggested by Mike Snitzer Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NMike Snitzer <snitzer@redhat.com> Tested-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tejun Heo 提交于
This patch converts bio-based dm to support REQ_FLUSH/FUA instead of now deprecated REQ_HARDBARRIER. * -EOPNOTSUPP handling logic dropped. * Preflush is handled as before but postflush is dropped and replaced with passing down REQ_FUA to member request_queues. This replaces one array wide cache flush w/ member specific FUA writes. * __split_and_process_bio() now calls __clone_and_map_flush() directly for flushes and guarantees all FLUSH bio's going to targets are zero ` length. * It's now guaranteed that all FLUSH bio's which are passed onto dm targets are zero length. bio_empty_barrier() tests are replaced with REQ_FLUSH tests. * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes. * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely enough to be marked with unlikely(). * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA capability. * Request based dm isn't converted yet. dm_init_request_based_queue() resets flush support to 0 for now. To avoid disturbing request based dm code, dm->flush_error is added for bio based dm while requested based dm continues to use dm->barrier_error. Lightly tested linear, stripe, raid1, snap and crypt targets. Please proceed with caution as I'm not familiar with the code base. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-