1. 31 12月, 2015 1 次提交
    • Z
      bcache: fix a livelock when we cause a huge number of cache misses · 2ef9ccbf
      Zheng Liu 提交于
      Subject :	[PATCH v2] bcache: fix a livelock in btree lock
      Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)
      
      This commit tries to fix a livelock in bcache.  This livelock might
      happen when we causes a huge number of cache misses simultaneously.
      
      When we get a cache miss, bcache will execute the following path.
      
      ->cached_dev_make_request()
        ->cached_dev_read()
          ->cached_lookup()
            ->bch->btree_map_keys()
              ->btree_root()  <------------------------
                ->bch_btree_map_keys_recurse()        |
                  ->cache_lookup_fn()                 |
                    ->cached_dev_cache_miss()         |
                      ->bch_btree_insert_check_key() -|
                        [If btree->seq is not equal to seq + 1, we should return
                         EINTR and traverse btree again.]
      
      In bch_btree_insert_check_key() function we first need to check upgrade
      flag (op->lock == -1), and when this flag is true we need to release
      read btree->lock and try to take write btree->lock.  During taking and
      releasing this write lock, btree->seq will be monotone increased in
      order to prevent other threads modify this in cache miss (see btree.h:74).
      But if there are some cache misses caused by some requested, we could
      meet a livelock because btree->seq is always changed by others.  Thus no
      one can make progress.
      
      This commit will try to take write btree->lock if it encounters a race
      when we traverse btree.  Although it sacrifice the scalability but we
      can ensure that only one can modify the btree.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Tested-by: NJoshua Schmid <jschmid@suse.com>
      Tested-by: NEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Joshua Schmid <jschmid@suse.com>
      Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2ef9ccbf
  2. 08 11月, 2015 1 次提交
  3. 06 11月, 2015 1 次提交
  4. 14 8月, 2015 1 次提交
  5. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  6. 17 7月, 2015 1 次提交
  7. 11 7月, 2015 1 次提交
  8. 01 7月, 2015 2 次提交
  9. 02 6月, 2015 1 次提交
    • T
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo 提交于
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66114cad
  10. 22 5月, 2015 1 次提交
    • M
      block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb
      Mike Snitzer 提交于
      Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
      non-chains") regressed all existing callers that followed this pattern:
       1) saving a bio's original bi_end_io
       2) wiring up an intermediate bi_end_io
       3) restoring the original bi_end_io from intermediate bi_end_io
       4) calling bio_endio() to execute the restored original bi_end_io
      
      The regression was due to BIO_CHAIN only ever getting set if
      bio_inc_remaining() is called.  For the above pattern it isn't set until
      step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
      the first bio_endio(), in step 2 above, never decremented __bi_remaining
      before calling the intermediate bi_end_io -- leaving __bi_remaining with
      the value 1 instead of 0.  When bio_inc_remaining() occurred during step
      3 it brought it to a value of 2.  When the second bio_endio() was
      called, in step 4 above, it should've called the original bi_end_io but
      it didn't because there was an extra reference that wasn't dropped (due
      to atomic operations being optimized away since BIO_CHAIN wasn't set
      upfront).
      
      Fix this issue by removing the __bi_remaining management complexity for
      all callers that use the above pattern -- bio_chain() is the only
      interface that _needs_ to be concerned with __bi_remaining.  For the
      above pattern callers just expect the bi_end_io they set to get called!
      Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
      that aren't associated with the bio_chain() interface.
      
      Also, the bio_inc_remaining() interface has been moved local to bio.c.
      
      Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      326e1dbb
  11. 06 5月, 2015 1 次提交
    • J
      bio: skip atomic inc/dec of ->bi_cnt for most use cases · dac56212
      Jens Axboe 提交于
      Struct bio has a reference count that controls when it can be freed.
      Most uses cases is allocating the bio, which then returns with a
      single reference to it, doing IO, and then dropping that single
      reference. We can remove this atomic_dec_and_test() in the completion
      path, if nobody else is holding a reference to the bio.
      
      If someone does call bio_get() on the bio, then we flag the bio as
      now having valid count and that we must properly honor the reference
      count when it's being put.
      Tested-by: NRobert Elliott <elliott@hp.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dac56212
  12. 24 11月, 2014 1 次提交
  13. 05 10月, 2014 1 次提交
    • M
      block: disable entropy contributions for nonrot devices · b277da0a
      Mike Snitzer 提交于
      Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
      QUEUE_FLAG_NONROT.
      
      Historically, all block devices have automatically made entropy
      contributions.  But as previously stated in commit e2e1a148 ("block: add
      sysfs knob for turning off disk entropy contributions"):
          - On SSD disks, the completion times aren't as random as they
            are for rotational drives. So it's questionable whether they
            should contribute to the random pool in the first place.
          - Calling add_disk_randomness() has a lot of overhead.
      
      There are more reliable sources for randomness than non-rotational block
      devices.  From a security perspective it is better to err on the side of
      caution than to allow entropy contributions from unreliable "random"
      sources.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b277da0a
  14. 05 8月, 2014 22 次提交
  15. 18 4月, 2014 1 次提交
  16. 19 3月, 2014 3 次提交