1. 16 10月, 2017 1 次提交
  2. 08 9月, 2017 1 次提交
    • T
      bcache: initialize dirty stripes in flash_dev_run() · 175206cf
      Tang Junhui 提交于
      bcache uses a Proportion-Differentiation Controller algorithm to control
      writeback rate to cached devices. In the PD controller algorithm, dirty
      stripes of thin flash device should not be counted in, because flash only
      volumes never write back dirty data.
      
      Currently dirty stripe counter for thin flash device is not initialized
      when the thin flash device starts. Which means the following calculation
      in PD controller will reference an undefined dirty stripes number, and
      all cached devices attached to the same cache set where the thin flash
      device lies on may have an inaccurate writeback rate.
      
      This patch calles bch_sectors_dirty_init() in flash_dev_run(), to
      correctly initialize dirty stripe counter when the thin flash device
      starts to run. This patch also does following parameter data type change,
       -void bch_sectors_dirty_init(struct cached_dev *dc);
       +void bch_sectors_dirty_init(struct bcache_device *);
      to call this function conveniently in flash_dev_run().
      
      (Commit log is composed by Coly Li)
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      175206cf
  3. 06 9月, 2017 2 次提交
    • T
      bcache: fix for gc and write-back race · 9baf3097
      Tang Junhui 提交于
      gc and write-back get raced (see the email "bcache get stucked" I sended
      before):
      gc thread                               write-back thread
      |                                       |bch_writeback_thread()
      |bch_gc_thread()                        |
      |                                       |==>read_dirty()
      |==>bch_btree_gc()                      |
      |==>btree_root() //get btree root       |
      |                //node write locker    |
      |==>bch_btree_gc_root()                 |
      |                                       |==>read_dirty_submit()
      |                                       |==>write_dirty()
      |                                       |==>continue_at(cl,
      |                                       |               write_dirty_finish,
      |                                       |               system_wq);
      |                                       |==>write_dirty_finish()//excute
      |                                       |               //in system_wq
      |                                       |==>bch_btree_insert()
      |                                       |==>bch_btree_map_leaf_nodes()
      |                                       |==>__bch_btree_map_nodes()
      |                                       |==>btree_root //try to get btree
      |                                       |              //root node read
      |                                       |              //lock
      |                                       |-----stuck here
      |==>bch_btree_set_root()
      |==>bch_journal_meta()
      |==>bch_journal()
      |==>journal_try_write()
      |==>journal_write_unlocked() //journal_full(&c->journal)
      |                            //condition satisfied
      |==>continue_at(cl, journal_write, system_wq); //try to excute
      |                               //journal_write in system_wq
      |                               //but work queue is excuting
      |                               //write_dirty_finish()
      |==>closure_sync(); //wait journal_write execute
      |                   //over and wake up gc,
      |-------------stuck here
      |==>release root node write locker
      
      This patch alloc a separate work-queue for write-back thread to avoid such
      race.
      
      (Commit log re-organized by Coly Li to pass checkpatch.pl checking)
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Acked-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9baf3097
    • T
      bcache: correct cache_dirty_target in __update_writeback_rate() · a8394090
      Tang Junhui 提交于
      __update_write_rate() uses a Proportion-Differentiation Controller
      algorithm to control writeback rate. A dirty target number is used in
      this PD controller to control writeback rate. A larger target number
      will make the writeback rate smaller, on the versus, a smaller target
      number will make the writeback rate larger.
      
      bcache uses the following steps to calculate the target number,
      1) cache_sectors = all-buckets-of-cache-set * buckets-size
      2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
      3) target = cache_dirty_target *
      (sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)
      
      The calculation at step 1) for cache_sectors is incorrect, which does
      not consider dirty blocks occupied by flash only volume.
      
      A flash only volume can be took as a bcache device without cached
      device. All data sectors allocated for it are persistent on cache device
      and marked dirty, they are not touched by bcache writeback and garbage
      collection code. So data blocks of flash only volume should be ignore
      when calculating cache_sectors of cache set.
      
      Current code does not subtract dirty sectors of flash only volume, which
      results a larger target number from the above 3 steps. And in sequence
      the cache device's writeback rate is smaller then a correct value,
      writeback speed is slower on all cached devices.
      
      This patch fixes the incorrect slower writeback rate by subtracting
      dirty sectors of flash only volumes in __update_writeback_rate().
      
      (Commit log composed by Coly Li to pass checkpatch.pl checking)
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a8394090
  4. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  5. 09 6月, 2017 1 次提交
  6. 02 3月, 2017 1 次提交
  7. 22 11月, 2016 1 次提交
  8. 22 9月, 2016 1 次提交
  9. 08 6月, 2016 1 次提交
  10. 24 5月, 2016 1 次提交
  11. 31 12月, 2015 1 次提交
  12. 14 8月, 2015 1 次提交
  13. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  14. 05 8月, 2014 1 次提交
    • S
      bcache: fix uninterruptible sleep in writeback thread · 9e5c3535
      Slava Pestov 提交于
      There were two issues here:
      
      - writeback thread did not start until the device first became dirty
      - writeback thread used uninterruptible sleep once running
      
      Without this patch I see kernel warnings printed and a load average of
      1.52 after booting my test VM. With this patch the warnings are gone and
      the load average is near 0.00 as expected.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      9e5c3535
  15. 17 12月, 2013 3 次提交
  16. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  17. 11 11月, 2013 12 次提交
  18. 25 9月, 2013 2 次提交
  19. 02 7月, 2013 1 次提交
    • K
      bcache: Use standard utility code · 8e51e414
      Kent Overstreet 提交于
      Some of bcache's utility code has made it into the rest of the kernel,
      so drop the bcache versions.
      
      Bcache used to have a workaround for allocating from a bio set under
      generic_make_request() (if you allocated more than once, the bios you
      already allocated would get stuck on current->bio_list when you
      submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
      when allocating bios under generic_make_request() so that allocation
      could fail and it could retry from workqueue. But bio_alloc_bioset() has
      a workaround now, so we can drop this hack and the associated error
      handling.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      8e51e414
  20. 27 6月, 2013 4 次提交
    • K
      bcache: Write out full stripes · 72c27061
      Kent Overstreet 提交于
      Now that we're tracking dirty data per stripe, we can add two
      optimizations for raid5/6:
      
       * If a stripe is already dirty, force writes to that stripe to
         writeback mode - to help build up full stripes of dirty data
      
       * When flushing dirty data, preferentially write out full stripes first
         if there are any.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      72c27061
    • K
      bcache: Track dirty data by stripe · 279afbad
      Kent Overstreet 提交于
      To make background writeback aware of raid5/6 stripes, we first need to
      track the amount of dirty data within each stripe - we do this by
      breaking up the existing sectors_dirty into per stripe atomic_ts
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      279afbad
    • K
      bcache: Initialize sectors_dirty when attaching · 444fc0b6
      Kent Overstreet 提交于
      Previously, dirty_data wouldn't get initialized until the first garbage
      collection... which was a bit of a problem for background writeback (as
      the PD controller keys off of it) and also confusing for users.
      
      This is also prep work for making background writeback aware of raid5/6
      stripes.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      444fc0b6
    • K
      bcache: Fix/revamp tracepoints · c37511b8
      Kent Overstreet 提交于
      The tracepoints were reworked to be more sensible, and fixed a null
      pointer deref in one of the tracepoints.
      
      Converted some of the pr_debug()s to tracepoints - this is partly a
      performance optimization; it used to be that with DEBUG or
      CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
      was changed to an empty inline function.
      
      Some of the pr_debug() statements had rather expensive function calls as
      part of the arguments, so this code was getting run unnecessarily even
      on non debug kernels - in some fast paths, too.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      c37511b8
  21. 15 5月, 2013 1 次提交
  22. 29 3月, 2013 1 次提交