1. 19 3月, 2014 5 次提交
    • K
      bcache: Kill unused freelist · 2531d9ee
      Kent Overstreet 提交于
      This was originally added as at optimization that for various reasons isn't
      needed anymore, but it does add a lot of nasty corner cases (and it was
      responsible for some recently fixed bugs). Just get rid of it now.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      2531d9ee
    • K
      bcache: Rework btree cache reserve handling · 0a63b66d
      Kent Overstreet 提交于
      This changes the bucket allocation reserves to use _real_ reserves - separate
      freelists - instead of watermarks, which if nothing else makes the current code
      saner to reason about and is going to be important in the future when we add
      support for multiple btrees.
      
      It also adds btree_check_reserve(), which checks (and locks) the reserves for
      both bucket allocation and memory allocation for btree nodes; the old code just
      kinda sorta assumed that since (e.g. for btree node splits) it had the root
      locked and that meant no other threads could try to make use of the same
      reserve; this technically should have been ok for memory allocation (we should
      always have a reserve for memory allocation (the btree node cache is used as a
      reserve and we preallocate it)), but multiple btrees will mean that locking the
      root won't be sufficient anymore, and for the bucket allocation reserve it was
      technically possible for the old code to deadlock.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      0a63b66d
    • K
      bcache: Kill btree_io_wq · 56b30770
      Kent Overstreet 提交于
      With the locking rework in the last patch, this shouldn't be needed anymore -
      btree_node_write_work() only takes b->write_lock which is never held for very
      long.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      56b30770
    • K
      bcache: Add a real GC_MARK_RECLAIMABLE · 4fe6a816
      Kent Overstreet 提交于
      This means the garbage collection code can better check for data and metadata
      pointers to the same buckets.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      4fe6a816
    • N
      bcache: Fix moving_gc deadlocking with a foreground write · da415a09
      Nicholas Swenson 提交于
      Deadlock happened because a foreground write slept, waiting for a bucket
      to be allocated. Normally the gc would mark buckets available for invalidation.
      But the moving_gc was stuck waiting for outstanding writes to complete.
      These writes used the bcache_wq, the same queue foreground writes used.
      
      This fix gives moving_gc its own work queue, so it was still finish moving
      even if foreground writes are stuck waiting for allocation. It also makes
      work queue a parameter to the data_insert path, so moving_gc can use its
      workqueue for writes.
      Signed-off-by: NNicholas Swenson <nks@daterainc.com>
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      da415a09
  2. 30 1月, 2014 1 次提交
    • D
      bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USED · 94717447
      Darrick J. Wong 提交于
      The BUG_ON at the end of __bch_btree_mark_key can be triggered due to
      an integer overflow error:
      
      BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13);
      ...
      SET_GC_SECTORS_USED(g, min_t(unsigned,
      	     GC_SECTORS_USED(g) + KEY_SIZE(k),
      	     (1 << 14) - 1));
      BUG_ON(!GC_SECTORS_USED(g));
      
      In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide.
      While the SET_ code tries to ensure that the field doesn't overflow by
      clamping it to (1<<14)-1 == 16383, this is incorrect because 16383
      requires 14 bits.  Therefore, if GC_SECTORS_USED() + KEY_SIZE() =
      8192, the SET_ statement tries to store 8192 into a 13-bit field.  In
      a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON.
      
      Therefore, create a field width constant and a max value constant, and
      use those to create the bitfield and check the inputs to
      SET_GC_SECTORS_USED.  Arguably the BITMASK() template ought to have
      BUG_ON checks for too-large values, but that's a separate patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      94717447
  3. 09 1月, 2014 11 次提交
  4. 17 12月, 2013 2 次提交
  5. 24 11月, 2013 2 次提交
    • K
      block: Introduce new bio_split() · 20d0189b
      Kent Overstreet 提交于
      The new bio_split() can split arbitrary bios - it's not restricted to
      single page bios, like the old bio_split() (previously renamed to
      bio_pair_split()). It also has different semantics - it doesn't allocate
      a struct bio_pair, leaving it up to the caller to handle completions.
      
      Then convert the existing bio_pair_split() users to the new bio_split()
      - and also nvme, which was open coding bio splitting.
      
      (We have to take that BUG_ON() out of bio_integrity_trim() because this
      bio_split() needs to use it, and there's no reason it has to be used on
      bios marked as cloned; BIO_CLONED doesn't seem to have clearly
      documented semantics anyways.)
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Neil Brown <neilb@suse.de>
      20d0189b
    • K
      bcache: Kill unaligned bvec hack · ed9c47be
      Kent Overstreet 提交于
      Bcache has a hack to avoid cloning the biovec if it's all full pages -
      but with immutable biovecs coming this won't be necessary anymore.
      
      For now, we remove the special case and always clone the bvec array so
      that the immutable biovec patches are simpler.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      ed9c47be
  6. 11 11月, 2013 18 次提交
  7. 25 9月, 2013 1 次提交
    • K
      bcache: Fix a writeback performance regression · c2a4f318
      Kent Overstreet 提交于
      Background writeback works by scanning the btree for dirty data and
      adding those keys into a fixed size buffer, then for each dirty key in
      the keybuf writing it to the backing device.
      
      When read_dirty() finishes and it's time to scan for more dirty data, we
      need to wait for the outstanding writeback IO to finish - they still
      take up slots in the keybuf (so that foreground writes can check for
      them to avoid races) - without that wait, we'll continually rescan when
      we'll be able to add at most a key or two to the keybuf, and that takes
      locks that starves foreground IO.  Doh.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2a4f318