- 12 7月, 2013 8 次提交
-
-
由 Kent Overstreet 提交于
The alloc kthread should've been using try_to_freeze() - and also there was the potential for the alloc kthread to get woken up after it had shut down, which would have been bad. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Part of the job of garbage collection is to add up however many sectors of live data it finds in each bucket, but that doesn't work very well if it doesn't reset GC_SECTORS_USED() when it starts. Whoops. This wouldn't have broken anything horribly, but allocation tries to preferentially reclaim buckets that are mostly empty and that's not gonna work with an incorrect GC_SECTORS_USED() value. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
-
由 Kent Overstreet 提交于
The journal replay code starts by finding something that looks like a valid journal entry, then it does a binary search over the unchecked region of the journal for the journal entries with the highest sequence numbers. Trouble is, the logic was wrong - journal_read_bucket() returns true if it found journal entries we need, but if the range of journal entries we're looking for loops around the end of the journal - in that case journal_read_bucket() could return true when it hadn't found the highest sequence number we'd seen yet, and in that case the binary search did the wrong thing. Whoops. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
-
由 Kent Overstreet 提交于
Stopping a cache set is supposed to make it stop attached backing devices, but somewhere along the way that code got lost. Fixing this mainly has the effect of fixing our reboot notifier. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
-
由 Kent Overstreet 提交于
If we stopped a bcache device when we were already detaching (or something like that), bcache_device_unlink() would try to remove a symlink from sysfs that was already gone because the bcache dev kobject had already been removed from sysfs. So keep track of whether we've removed stuff from sysfs. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
-
由 Kent Overstreet 提交于
Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered out unless we say we support them... Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
-
由 Dan Carpenter 提交于
There is a missing NULL check after the kzalloc(). Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
-
由 Kent Overstreet 提交于
In the far-too-complicated closure code - closures can have destructors, for probably dubious reasons; they get run after the closure is no longer waiting on anything but before dropping the parent ref, intended just for freeing whatever memory the closure is embedded in. Trouble is, when remaining goes to 0 and we've got nothing more to run - we also have to unlock the closure, setting remaining to -1. If there's a destructor, that unlock isn't doing anything - nobody could be trying to lock it if we're about to free it - but if the unlock _is needed... that check for a destructor was racy. Argh. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
-
- 02 7月, 2013 4 次提交
-
-
由 Kent Overstreet 提交于
Some of bcache's utility code has made it into the rest of the kernel, so drop the bcache versions. Bcache used to have a workaround for allocating from a bio set under generic_make_request() (if you allocated more than once, the bios you already allocated would get stuck on current->bio_list when you submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT when allocating bios under generic_make_request() so that allocation could fail and it could retry from workqueue. But bio_alloc_bioset() has a workaround now, so we can drop this hack and the associated error handling. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
This code has rotted and it hasn't been used in ages anyways. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node writes have... weird ordering requirements. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
- 27 6月, 2013 13 次提交
-
-
由 Gabriel de Perthuis 提交于
Signed-off-by: NGabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Gabriel de Perthuis 提交于
Signed-off-by: NGabriel de Perthuis <g2p.code@gmail.com>
-
由 Kent Overstreet 提交于
Now that we're tracking dirty data per stripe, we can add two optimizations for raid5/6: * If a stripe is already dirty, force writes to that stripe to writeback mode - to help build up full stripes of dirty data * When flushing dirty data, preferentially write out full stripes first if there are any. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
To make background writeback aware of raid5/6 stripes, we first need to track the amount of dirty data within each stripe - we do this by breaking up the existing sectors_dirty into per stripe atomic_ts Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
Previously, dirty_data wouldn't get initialized until the first garbage collection... which was a bit of a problem for background writeback (as the PD controller keys off of it) and also confusing for users. This is also prep work for making background writeback aware of raid5/6 stripes. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
The old lazy sorting code was kind of hacky - rewrite in a way that mathematically makes more sense; the idea is that the size of the sets of keys in a btree node should increase by a more or less fixed ratio from smallest to biggest. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
Old gcc doesnt like the struct hack, and it is kind of ugly. So finish off the work to convert pr_debug() statements to tracepoints, and delete pkey()/pbtree(). Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
The tracepoints were reworked to be more sensible, and fixed a null pointer deref in one of the tracepoints. Converted some of the pr_debug()s to tracepoints - this is partly a performance optimization; it used to be that with DEBUG or CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it was changed to an empty inline function. Some of the pr_debug() statements had rather expensive function calls as part of the arguments, so this code was getting run unnecessarily even on non debug kernels - in some fast paths, too. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
The most significant change is that btree reads are now done synchronously, instead of asynchronously and doing the post read stuff from a workqueue. This was originally done because we can't block on IO under generic_make_request(). But - we already have a mechanism to punt cache lookups to workqueue if needed, so if we just use that we don't have to deal with the complexity of doing things asynchronously. The main benefit is this makes the locking situation saner; we can hold our write lock on the btree node until we're finished reading it, and we don't need that btree_node_read_done() flag anymore. Also, for writes, btree_write() was broken out into btree_node_write() and btree_leaf_dirty() - the old code with the boolean argument was dumb and confusing. The prio_blocked mechanism was improved a bit too, now the only counter is in struct btree_write, we don't mess with transfering a count from struct btree anymore. This required changing garbage collection to block prios at the start and unblock when it finishes, which is cleaner than what it was doing anyways (the old code had mostly the same effect, but was doing it in a convoluted way) And the btree iter btree_node_read_done() uses was converted to a real mempool. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
Using a workqueue when we just want a single thread is a bit silly. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Gabriel de Perthuis 提交于
Signed-off-by: NGabriel de Perthuis <g2p.code+bcache@gmail.com> Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kent Overstreet 提交于
An old version of gcc was complaining about using a const int as the size of a stack allocated array. Which should be fine - but using ARRAY_SIZE() is better, anyways. Also, refactor the code to use scnprintf(). Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Kumar Amit Mehta 提交于
bio_alloc_bioset returns NULL on failure. This fix adds a missing check for potential NULL pointer dereferencing. Signed-off-by: NKumar Amit Mehta <gmate.amit@gmail.com> Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
- 13 6月, 2013 4 次提交
-
-
由 H. Peter Anvin 提交于
There are cases where the kernel will believe that the WRITE SAME command is supported by a block device which does not, in fact, support WRITE SAME. This currently happens for SATA drivers behind a SAS controller, but there are probably a hundred other ways that can happen, including drive firmware bugs. After receiving an error for WRITE SAME the block layer will retry the request as a plain write of zeroes, but mdraid will consider the failure as fatal and consider the drive failed. This has the effect that all the mirrors containing a specific set of data are each offlined in very rapid succession resulting in data loss. However, just bouncing the request back up to the block layer isn't ideal either, because the whole initial request-retry sequence should be inside the write bitmap fence, which probably means that md needs to do its own conversion of WRITE SAME to write zero. Until the failure scenario has been sorted out, disable WRITE SAME for raid1, raid5, and raid10. [neilb: added raid5] This patch is appropriate for any -stable since 3.7 when write_same support was added. Cc: stable@vger.kernel.org Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Various places in raid1 and raid10 are calling raise_barrier when they really should call freeze_array. The former is only intended to be called from "make_request". The later has extra checks for 'nr_queued' and makes a call to flush_pending_writes(), so it is safe to call it from within the management thread. Using raise_barrier will sometimes deadlock. Using freeze_array should not. As 'freeze_array' currently expects one request to be pending (in handle_read_error - the only previous caller), we need to pass it the number of pending requests (extra) to ignore. The deadlock was made particularly noticeable by commits 050b6615 (raid10) and 6b740b8d (raid1) which appeared in 3.4, so the fix is appropriate for any -stable kernel since then. This patch probably won't apply directly to some early kernels and will need to be applied by hand. Cc: stable@vger.kernel.org Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Alex Lyakas 提交于
md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it. Without that fix, the following scenario could happen: - RAID1 with drives A and B; drive B was freshly-added and is rebuilding - Drive A fails - WRITE request arrives to the array. It is failed by drive A, so r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B succeeds in writing it, so the same r1_bio is marked as R1BIO_Uptodate. - r1_bio arrives to handle_write_finished, badblocks are disabled, md_error()->error() does nothing because we don't fail the last drive of raid1 - raid_end_bio_io() calls call_bio_endio() - As a result, in call_bio_endio(): if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) clear_bit(BIO_UPTODATE, &bio->bi_flags); this code doesn't clear the BIO_UPTODATE flag, and the whole master WRITE succeeds, back to the upper layer. So we returned success to the upper layer, even though we had written the data onto the rebuilding drive only. But when we want to read the data back, we would not read from the rebuilding drive, so this data is lost. [neilb - applied identical change to raid10 as well] This bug can result in lost data, so it is suitable for any -stable kernel. Cc: stable@vger.kernel.org Signed-off-by: NAlex Lyakas <alex@zadarastorage.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
__md_stop_writes() will currently sometimes freeze recovery. So any caller must be ready for that to happen, and indeed they are. However if __md_stop_writes() doesn't freeze_recovery, then a recovery could start before mddev_suspend() is called, which could be awkward. This can particularly cause problems or dm-raid. So change __md_stop_writes() to always freeze recovery. This is safe and more predicatable. Reported-by: NBrassow Jonathan <jbrassow@redhat.com> Tested-by: NBrassow Jonathan <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 30 5月, 2013 1 次提交
-
-
由 Kent Overstreet 提交于
The patch that converted raid5 to use bio_reset() forgot to initialize bi_vcnt. Signed-off-by: NKent Overstreet <koverstreet@google.com> Cc: NeilBrown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Tested-by: NIlia Mirkin <imirkin@alum.mit.edu> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 5月, 2013 1 次提交
-
-
由 Alasdair G Kergon 提交于
Fix detection of the need to resize the dm thin metadata device. The code incorrectly tried to extend the metadata device when it didn't need to due to a merging error with patch 24347e95 ("dm thin: detect metadata device resizing"). device-mapper: transaction manager: couldn't open metadata space map device-mapper: thin metadata: tm_open_with_sm failed device-mapper: thin: aborting transaction failed device-mapper: thin: switching pool to failure mode Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 15 5月, 2013 3 次提交
-
-
由 Kent Overstreet 提交于
This code appears to have rotted... fix various bugs and do some refactoring. Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
由 Paul Bolle 提交于
The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig symbol CLOSURES. That symbol was used in development versions of bcache, but was removed when the closures code was no longer provided as a kernel library. It can safely be dropped. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
-
由 Emil Goode 提交于
The function pointer release in struct block_device_operations should point to functions declared as void. Sparse warnings: drivers/md/bcache/super.c:656:27: warning: incorrect type in initializer (different base types) drivers/md/bcache/super.c:656:27: expected void ( *release )( ... ) drivers/md/bcache/super.c:656:27: got int ( static [toplevel] *<noident> )( ... ) drivers/md/bcache/super.c:656:2: warning: initialization from incompatible pointer type [enabled by default] drivers/md/bcache/super.c:656:2: warning: (near initialization for ‘bcache_ops.release’) [enabled by default] Signed-off-by: NEmil Goode <emilgoode@gmail.com> Signed-off-by: NKent Overstreet <koverstreet@google.com>
-
- 10 5月, 2013 6 次提交
-
-
由 Joe Thornber 提交于
Share configuration option processing code between the dm cache ctr and message functions. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
Move process_config_option() in dm-cache-target.c to make the next patch more readable. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Generate a dm event when the amount of remaining thin pool metadata space falls below a certain level. The threshold is taken to be a quarter of the size of the metadata device with a minimum threshold of 4MB. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Add a threshold callback to dm persistent data space maps. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Add a threshold callback function to the persistent data space map interface for a subsequent patch to use. dm-thin and dm-cache are interested in knowing when they're getting low on metadata or data blocks. This patch introduces a new method for registering a callback against a threshold. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Allow the dm thin pool metadata device to be extended. Whenever a pool is resumed, detect whether the size of the metadata device has increased, and if so, extend the metadata to use the new space. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-