1. 31 12月, 2015 1 次提交
    • Z
      bcache: fix a livelock when we cause a huge number of cache misses · 2ef9ccbf
      Zheng Liu 提交于
      Subject :	[PATCH v2] bcache: fix a livelock in btree lock
      Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)
      
      This commit tries to fix a livelock in bcache.  This livelock might
      happen when we causes a huge number of cache misses simultaneously.
      
      When we get a cache miss, bcache will execute the following path.
      
      ->cached_dev_make_request()
        ->cached_dev_read()
          ->cached_lookup()
            ->bch->btree_map_keys()
              ->btree_root()  <------------------------
                ->bch_btree_map_keys_recurse()        |
                  ->cache_lookup_fn()                 |
                    ->cached_dev_cache_miss()         |
                      ->bch_btree_insert_check_key() -|
                        [If btree->seq is not equal to seq + 1, we should return
                         EINTR and traverse btree again.]
      
      In bch_btree_insert_check_key() function we first need to check upgrade
      flag (op->lock == -1), and when this flag is true we need to release
      read btree->lock and try to take write btree->lock.  During taking and
      releasing this write lock, btree->seq will be monotone increased in
      order to prevent other threads modify this in cache miss (see btree.h:74).
      But if there are some cache misses caused by some requested, we could
      meet a livelock because btree->seq is always changed by others.  Thus no
      one can make progress.
      
      This commit will try to take write btree->lock if it encounters a race
      when we traverse btree.  Although it sacrifice the scalability but we
      can ensure that only one can modify the btree.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Tested-by: NJoshua Schmid <jschmid@suse.com>
      Tested-by: NEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Joshua Schmid <jschmid@suse.com>
      Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2ef9ccbf
  2. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  3. 05 8月, 2014 6 次提交
  4. 19 3月, 2014 10 次提交
    • K
      bcache: Kill bucket->gc_gen · 3a2fd9d5
      Kent Overstreet 提交于
      gc_gen was a temporary used to recalculate last_gc, but since we only need
      bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
      last_gc directly.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      3a2fd9d5
    • K
      bcache: Kill unused freelist · 2531d9ee
      Kent Overstreet 提交于
      This was originally added as at optimization that for various reasons isn't
      needed anymore, but it does add a lot of nasty corner cases (and it was
      responsible for some recently fixed bugs). Just get rid of it now.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      2531d9ee
    • K
      bcache: Rework btree cache reserve handling · 0a63b66d
      Kent Overstreet 提交于
      This changes the bucket allocation reserves to use _real_ reserves - separate
      freelists - instead of watermarks, which if nothing else makes the current code
      saner to reason about and is going to be important in the future when we add
      support for multiple btrees.
      
      It also adds btree_check_reserve(), which checks (and locks) the reserves for
      both bucket allocation and memory allocation for btree nodes; the old code just
      kinda sorta assumed that since (e.g. for btree node splits) it had the root
      locked and that meant no other threads could try to make use of the same
      reserve; this technically should have been ok for memory allocation (we should
      always have a reserve for memory allocation (the btree node cache is used as a
      reserve and we preallocate it)), but multiple btrees will mean that locking the
      root won't be sufficient anymore, and for the bucket allocation reserve it was
      technically possible for the old code to deadlock.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      0a63b66d
    • K
      bcache: Kill btree_io_wq · 56b30770
      Kent Overstreet 提交于
      With the locking rework in the last patch, this shouldn't be needed anymore -
      btree_node_write_work() only takes b->write_lock which is never held for very
      long.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      56b30770
    • K
      bcache: btree locking rework · 2a285686
      Kent Overstreet 提交于
      Add a new lock, b->write_lock, which is required to actually modify - or write -
      a btree node; this lock is only held for short durations.
      
      This means we can write out a btree node without taking b->lock, which _is_ held
      for long durations - solving a deadlock when btree_flush_write() (from the
      journalling code) is called with a btree node locked.
      
      Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
      rework is going to happen a lot more.
      
      This also turns b->lock is now more of a read/intent lock instead of a
      read/write lock - but not completely, since it still blocks readers. May turn it
      into a real intent lock at some point in the future.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      2a285686
    • K
      bcache: Fix a race when freeing btree nodes · 05335cff
      Kent Overstreet 提交于
      This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
      bucket on the unused freelist, where it can be reused right away without any
      ordering requirements. It would be better to wait on at least a journal write to
      go down before reusing the bucket. bch_btree_set_root() does this, and inserting
      into non leaf nodes is completely synchronous so we should be ok, but future
      patches are just going to get rid of the unused freelist - it was needed in the
      past for various reasons but shouldn't be anymore.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      05335cff
    • K
      bcache: Add a real GC_MARK_RECLAIMABLE · 4fe6a816
      Kent Overstreet 提交于
      This means the garbage collection code can better check for data and metadata
      pointers to the same buckets.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      4fe6a816
    • K
      bcache: Kill dead cgroup code · 3f5e0a34
      Kent Overstreet 提交于
      This hasn't been used or even enabled in ages.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      3f5e0a34
    • K
      bcache: Fix another bug recovering from unclean shutdown · 487dded8
      Kent Overstreet 提交于
      The on disk bucket gens are allowed to be out of date, when we reuse buckets
      that didn't have any live data in them. To deal with this, the initial gc has to
      update the bucket gen when we find a pointer gen newer than the bucket's gen.
      
      Unfortunately we weren't doing this for pointers in the journal that we're about
      to replay.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      487dded8
    • K
      bcache: Fix a bug recovering from unclean shutdown · 0bd143fd
      Kent Overstreet 提交于
      The code to fixup incorrect bucket prios incorrectly did not skip btree node
      freeing keys
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      0bd143fd
  5. 30 1月, 2014 2 次提交
    • K
      bcache: Minor fixes from kbuild robot · 3572324a
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      3572324a
    • D
      bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USED · 94717447
      Darrick J. Wong 提交于
      The BUG_ON at the end of __bch_btree_mark_key can be triggered due to
      an integer overflow error:
      
      BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13);
      ...
      SET_GC_SECTORS_USED(g, min_t(unsigned,
      	     GC_SECTORS_USED(g) + KEY_SIZE(k),
      	     (1 << 14) - 1));
      BUG_ON(!GC_SECTORS_USED(g));
      
      In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide.
      While the SET_ code tries to ensure that the field doesn't overflow by
      clamping it to (1<<14)-1 == 16383, this is incorrect because 16383
      requires 14 bits.  Therefore, if GC_SECTORS_USED() + KEY_SIZE() =
      8192, the SET_ statement tries to store 8192 into a 13-bit field.  In
      a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON.
      
      Therefore, create a field width constant and a max value constant, and
      use those to create the bitfield and check the inputs to
      SET_GC_SECTORS_USED.  Arguably the BITMASK() template ought to have
      BUG_ON checks for too-large values, but that's a separate patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      94717447
  6. 09 1月, 2014 19 次提交
  7. 17 12月, 2013 1 次提交