- 19 3月, 2014 8 次提交
-
-
由 Kent Overstreet 提交于
This changes the bucket allocation reserves to use _real_ reserves - separate freelists - instead of watermarks, which if nothing else makes the current code saner to reason about and is going to be important in the future when we add support for multiple btrees. It also adds btree_check_reserve(), which checks (and locks) the reserves for both bucket allocation and memory allocation for btree nodes; the old code just kinda sorta assumed that since (e.g. for btree node splits) it had the root locked and that meant no other threads could try to make use of the same reserve; this technically should have been ok for memory allocation (we should always have a reserve for memory allocation (the btree node cache is used as a reserve and we preallocate it)), but multiple btrees will mean that locking the root won't be sufficient anymore, and for the bucket allocation reserve it was technically possible for the old code to deadlock. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
With the locking rework in the last patch, this shouldn't be needed anymore - btree_node_write_work() only takes b->write_lock which is never held for very long. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Add a new lock, b->write_lock, which is required to actually modify - or write - a btree node; this lock is only held for short durations. This means we can write out a btree node without taking b->lock, which _is_ held for long durations - solving a deadlock when btree_flush_write() (from the journalling code) is called with a btree node locked. Right now just occurs in bch_btree_set_root(), but with an upcoming journalling rework is going to happen a lot more. This also turns b->lock is now more of a read/intent lock instead of a read/write lock - but not completely, since it still blocks readers. May turn it into a real intent lock at some point in the future. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the bucket on the unused freelist, where it can be reused right away without any ordering requirements. It would be better to wait on at least a journal write to go down before reusing the bucket. bch_btree_set_root() does this, and inserting into non leaf nodes is completely synchronous so we should be ok, but future patches are just going to get rid of the unused freelist - it was needed in the past for various reasons but shouldn't be anymore. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
This means the garbage collection code can better check for data and metadata pointers to the same buckets. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
This hasn't been used or even enabled in ages. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
The on disk bucket gens are allowed to be out of date, when we reuse buckets that didn't have any live data in them. To deal with this, the initial gc has to update the bucket gen when we find a pointer gen newer than the bucket's gen. Unfortunately we weren't doing this for pointers in the journal that we're about to replay. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
The code to fixup incorrect bucket prios incorrectly did not skip btree node freeing keys Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
- 30 1月, 2014 2 次提交
-
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Darrick J. Wong 提交于
The BUG_ON at the end of __bch_btree_mark_key can be triggered due to an integer overflow error: BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13); ... SET_GC_SECTORS_USED(g, min_t(unsigned, GC_SECTORS_USED(g) + KEY_SIZE(k), (1 << 14) - 1)); BUG_ON(!GC_SECTORS_USED(g)); In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide. While the SET_ code tries to ensure that the field doesn't overflow by clamping it to (1<<14)-1 == 16383, this is incorrect because 16383 requires 14 bits. Therefore, if GC_SECTORS_USED() + KEY_SIZE() = 8192, the SET_ statement tries to store 8192 into a 13-bit field. In a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON. Therefore, create a field width constant and a max value constant, and use those to create the bitfield and check the inputs to SET_GC_SECTORS_USED. Arguably the BITMASK() template ought to have BUG_ON checks for too-large values, but that's a separate patch. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 1月, 2014 19 次提交
-
-
由 Kent Overstreet 提交于
We need to return -EINTR after a split because we invalidated iterators (and freed the btree node) - but if we were finished inserting, we don't want to redo the traversal. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Now handling overlapping extents/keys is a method that's specific to what the btree node contains. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
More work to disentangle various code from struct btree Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
More work to disentangle various code from struct btree Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
More work to disentangle bset.c from struct btree Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Helper function to explicitly check how much space is free in a btree node Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Soon, bset.c won't need to depend on struct btree. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
More work to disentangle bset.c from the rest of the code: Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
More disentangling bset.c from the rest of the bcache code - soon, the sorting routines won't have any dependencies on any outside structs. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Only use extent comparison for comparing extents, so we're not using START_KEY() on other key types (i.e. btree pointers) Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
More refactoring: node() -> bset_bkey_idx() end() -> bset_bkey_last() Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic 2s Also - split out the part that checks against journal entry size so as to avoid a dependancy on struct cache_set in bset.c Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Used this fixed code to find and fix the bug fixed by a4d885097b0ac0cd1337f171f2d4b83e946094d4. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
That was a terrible name for a macro, add some better helpers to replace it. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
This error path shouldn't have been hit in practice.. and we've got reworked reserve code coming soon so that it shouldn't _ever_ be bit... but if we've got code for this error path it should be correct. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
We need a reserve for allocating buckets for new btree nodes - and now that we've got multiple btrees, it really needs to be per btree. This reworks the reserves so we've got separate freelists for each reserve instead of watermarks, which seems to make things a bit cleaner, and it adds some code so that btree_split() can make sure the reserve is available before it starts. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
- 17 12月, 2013 2 次提交
-
-
由 Nicholas Swenson 提交于
Garbage collector needs to check keys in the writeback keybuf to make sure it's not invalidating buckets to which the writeback keys point to. Signed-off-by: NNicholas Swenson <nks@daterainc.com> Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes necessary that got squashed in with other patches; the full patch against 3.10 is 408cc2f47eeac93a, available at: git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 2a46036..4a12b2f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert, if (KEY_START(k) > KEY_START(insert) + sectors_found) goto check_failed; - if (KEY_PTRS(replace_key) != KEY_PTRS(k)) + if (KEY_PTRS(k) != KEY_PTRS(replace_key) || + KEY_DIRTY(k) != KEY_DIRTY(replace_key)) goto check_failed; /* skip past gen */
-
- 29 11月, 2013 1 次提交
-
-
由 Wei Yongjun 提交于
Fixes the following sparse warning: drivers/md/bcache/btree.c:2220:5: warning: symbol 'btree_insert_fn' was not declared. Should it be static? Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
- 24 11月, 2013 2 次提交
-
-
由 Kent Overstreet 提交于
More prep work for immutable biovecs - with immutable bvecs drivers won't be able to use the biovec directly, they'll need to use helpers that take into account bio->bi_iter.bi_bvec_done. This updates callers for the new usage without changing the implementation yet. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Jim Paris <jim@jtan.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Cc: support@lsi.com Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Cc: linux-m68k@lists.linux-m68k.org Cc: linuxppc-dev@lists.ozlabs.org Cc: drbd-user@lists.linbit.com Cc: nbd-general@lists.sourceforge.net Cc: cbe-oss-dev@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: DL-MPTFusionLinux@lsi.com Cc: linux-scsi@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: linux-fsdevel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: linux-mm@kvack.org Acked-by: NGeoff Levand <geoff@infradead.org>
-
由 Kent Overstreet 提交于
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
-
- 11 11月, 2013 6 次提交
-
-
由 Kent Overstreet 提交于
The old scanning-by-stripe code burned too much CPU, this should be better. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
The flow control in btree_insert_node() was... fragile... before, this'll use more stack (but since our btrees are never more than depth 1, that shouldn't matter) and it should be significantly clearer and less fragile. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Minor cleanup. Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
This dates from before the btree iterator, and now it's finally gone Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Not a complete fix - we could still deadlock if btree_insert_node() has to split... Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-
由 Kent Overstreet 提交于
Big garbage collection rewrite; now, garbage collection uses the same mechanisms as used elsewhere for inserting/updating btree node pointers, instead of rewriting interior btree nodes in place. This makes the code significantly cleaner and less fragile, and means we can now make garbage collection incremental - it doesn't have to hold a write lock on the root of the btree for the entire duration of garbage collection. This means that there's less of a latency hit for doing garbage collection, which means we can gc more frequently (and do a better job of reclaiming from the cache), and we can coalesce across more btree nodes (improving our space efficiency). Signed-off-by: NKent Overstreet <kmo@daterainc.com>
-