1. 19 3月, 2014 8 次提交
    • K
      bcache: Rework btree cache reserve handling · 0a63b66d
      Kent Overstreet 提交于
      This changes the bucket allocation reserves to use _real_ reserves - separate
      freelists - instead of watermarks, which if nothing else makes the current code
      saner to reason about and is going to be important in the future when we add
      support for multiple btrees.
      
      It also adds btree_check_reserve(), which checks (and locks) the reserves for
      both bucket allocation and memory allocation for btree nodes; the old code just
      kinda sorta assumed that since (e.g. for btree node splits) it had the root
      locked and that meant no other threads could try to make use of the same
      reserve; this technically should have been ok for memory allocation (we should
      always have a reserve for memory allocation (the btree node cache is used as a
      reserve and we preallocate it)), but multiple btrees will mean that locking the
      root won't be sufficient anymore, and for the bucket allocation reserve it was
      technically possible for the old code to deadlock.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      0a63b66d
    • K
      bcache: Kill btree_io_wq · 56b30770
      Kent Overstreet 提交于
      With the locking rework in the last patch, this shouldn't be needed anymore -
      btree_node_write_work() only takes b->write_lock which is never held for very
      long.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      56b30770
    • K
      bcache: btree locking rework · 2a285686
      Kent Overstreet 提交于
      Add a new lock, b->write_lock, which is required to actually modify - or write -
      a btree node; this lock is only held for short durations.
      
      This means we can write out a btree node without taking b->lock, which _is_ held
      for long durations - solving a deadlock when btree_flush_write() (from the
      journalling code) is called with a btree node locked.
      
      Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
      rework is going to happen a lot more.
      
      This also turns b->lock is now more of a read/intent lock instead of a
      read/write lock - but not completely, since it still blocks readers. May turn it
      into a real intent lock at some point in the future.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      2a285686
    • K
      bcache: Fix a race when freeing btree nodes · 05335cff
      Kent Overstreet 提交于
      This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
      bucket on the unused freelist, where it can be reused right away without any
      ordering requirements. It would be better to wait on at least a journal write to
      go down before reusing the bucket. bch_btree_set_root() does this, and inserting
      into non leaf nodes is completely synchronous so we should be ok, but future
      patches are just going to get rid of the unused freelist - it was needed in the
      past for various reasons but shouldn't be anymore.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      05335cff
    • K
      bcache: Add a real GC_MARK_RECLAIMABLE · 4fe6a816
      Kent Overstreet 提交于
      This means the garbage collection code can better check for data and metadata
      pointers to the same buckets.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      4fe6a816
    • K
      bcache: Kill dead cgroup code · 3f5e0a34
      Kent Overstreet 提交于
      This hasn't been used or even enabled in ages.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      3f5e0a34
    • K
      bcache: Fix another bug recovering from unclean shutdown · 487dded8
      Kent Overstreet 提交于
      The on disk bucket gens are allowed to be out of date, when we reuse buckets
      that didn't have any live data in them. To deal with this, the initial gc has to
      update the bucket gen when we find a pointer gen newer than the bucket's gen.
      
      Unfortunately we weren't doing this for pointers in the journal that we're about
      to replay.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      487dded8
    • K
      bcache: Fix a bug recovering from unclean shutdown · 0bd143fd
      Kent Overstreet 提交于
      The code to fixup incorrect bucket prios incorrectly did not skip btree node
      freeing keys
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      0bd143fd
  2. 30 1月, 2014 2 次提交
    • K
      bcache: Minor fixes from kbuild robot · 3572324a
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      3572324a
    • D
      bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USED · 94717447
      Darrick J. Wong 提交于
      The BUG_ON at the end of __bch_btree_mark_key can be triggered due to
      an integer overflow error:
      
      BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13);
      ...
      SET_GC_SECTORS_USED(g, min_t(unsigned,
      	     GC_SECTORS_USED(g) + KEY_SIZE(k),
      	     (1 << 14) - 1));
      BUG_ON(!GC_SECTORS_USED(g));
      
      In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide.
      While the SET_ code tries to ensure that the field doesn't overflow by
      clamping it to (1<<14)-1 == 16383, this is incorrect because 16383
      requires 14 bits.  Therefore, if GC_SECTORS_USED() + KEY_SIZE() =
      8192, the SET_ statement tries to store 8192 into a 13-bit field.  In
      a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON.
      
      Therefore, create a field width constant and a max value constant, and
      use those to create the bitfield and check the inputs to
      SET_GC_SECTORS_USED.  Arguably the BITMASK() template ought to have
      BUG_ON checks for too-large values, but that's a separate patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      94717447
  3. 09 1月, 2014 19 次提交
  4. 17 12月, 2013 2 次提交
    • N
      bcache: fix for gc and writeback race · bf0a628a
      Nicholas Swenson 提交于
      Garbage collector needs to check keys in the writeback keybuf to
      make sure it's not invalidating buckets to which the writeback
      keys point to.
      Signed-off-by: NNicholas Swenson <nks@daterainc.com>
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      bf0a628a
    • K
      bcache: Fix dirty_data accounting · d24a6e10
      Kent Overstreet 提交于
      Dirty data accounting wasn't quite right - firstly, we were adding the key we're
      inserting after it could have merged with another dirty key already in the
      btree, and secondly we could sometimes pass the wrong offset to
      bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
      important when tracking dirty data by stripe.
      
      NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes
      necessary that got squashed in with other patches; the full patch against 3.10
      is 408cc2f47eeac93a, available at:
        git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      
      diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
      index 2a46036..4a12b2f 100644
      --- a/drivers/md/bcache/btree.c
      +++ b/drivers/md/bcache/btree.c
      @@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert,
       			if (KEY_START(k) > KEY_START(insert) + sectors_found)
       				goto check_failed;
      
      -			if (KEY_PTRS(replace_key) != KEY_PTRS(k))
      +			if (KEY_PTRS(k) != KEY_PTRS(replace_key) ||
      +			    KEY_DIRTY(k) != KEY_DIRTY(replace_key))
       				goto check_failed;
      
       			/* skip past gen */
      d24a6e10
  5. 29 11月, 2013 1 次提交
  6. 24 11月, 2013 2 次提交
    • K
      block: Convert bio_for_each_segment() to bvec_iter · 7988613b
      Kent Overstreet 提交于
      More prep work for immutable biovecs - with immutable bvecs drivers
      won't be able to use the biovec directly, they'll need to use helpers
      that take into account bio->bi_iter.bi_bvec_done.
      
      This updates callers for the new usage without changing the
      implementation yet.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: support@lsi.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      Cc: cbe-oss-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: DL-MPTFusionLinux@lsi.com
      Cc: linux-scsi@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: cluster-devel@redhat.com
      Cc: linux-mm@kvack.org
      Acked-by: NGeoff Levand <geoff@infradead.org>
      7988613b
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  7. 11 11月, 2013 6 次提交