1. 15 7月, 2022 2 次提交
  2. 28 6月, 2022 1 次提交
  3. 24 5月, 2022 1 次提交
    • C
      bcache: avoid journal no-space deadlock by reserving 1 journal bucket · 32feee36
      Coly Li 提交于
      The journal no-space deadlock was reported time to time. Such deadlock
      can happen in the following situation.
      
      When all journal buckets are fully filled by active jset with heavy
      write I/O load, the cache set registration (after a reboot) will load
      all active jsets and inserting them into the btree again (which is
      called journal replay). If a journaled bkey is inserted into a btree
      node and results btree node split, new journal request might be
      triggered. For example, the btree grows one more level after the node
      split, then the root node record in cache device super block will be
      upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
      space in journal buckets, the journal replay has to wait for new journal
      bucket to be reclaimed after at least one journal bucket replayed. This
      is one example that how the journal no-space deadlock happens.
      
      The solution to avoid the deadlock is to reserve 1 journal bucket in
      run time, and only permit the reserved journal bucket to be used during
      cache set registration procedure for things like journal replay. Then
      the journal space will never be fully filled, there is no chance for
      journal no-space deadlock to happen anymore.
      
      This patch adds a new member "bool do_reserve" in struct journal, it is
      inititalized to 0 (false) when struct journal is allocated, and set to
      1 (true) by bch_journal_space_reserve() when all initialization done in
      run_cache_set(). In the run time when journal_reclaim() tries to
      allocate a new journal bucket, free_journal_buckets() is called to check
      whether there are enough free journal buckets to use. If there is only
      1 free journal bucket and journal->do_reserve is 1 (true), the last
      bucket is reserved and free_journal_buckets() will return 0 to indicate
      no free journal bucket. Then journal_reclaim() will give up, and try
      next time to see whetheer there is free journal bucket to allocate. By
      this method, there is always 1 jouranl bucket reserved in run time.
      
      During the cache set registration, journal->do_reserve is 0 (false), so
      the reserved journal bucket can be used to avoid the no-space deadlock.
      Reported-by: NNikhil Kshirsagar <nkshirsagar@gmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220524102336.10684-5-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
      32feee36
  4. 18 4月, 2022 1 次提交
  5. 02 2月, 2022 2 次提交
  6. 15 12月, 2021 1 次提交
    • L
      bcache: fix NULL pointer reference in cached_dev_detach_finish · aa97f6cd
      Lin Feng 提交于
      Commit 0259d449 ("bcache: move calc_cached_dev_sectors to proper
      place on backing device detach") tries to fix calc_cached_dev_sectors
      when bcache device detaches, but now we have:
      
      cached_dev_detach_finish
          ...
          bcache_device_detach(&dc->disk);
              ...
              closure_put(&d->c->caching);
              d->c = NULL; [*explicitly set dc->disk.c to NULL*]
          list_move(&dc->list, &uncached_devices);
          calc_cached_dev_sectors(dc->disk.c); [*passing a NULL pointer*]
          ...
      
      Upper codeflows shows how bug happens, this patch fix the problem by
      caching dc->disk.c beforehand, and cache_set won't be freed under us
      because c->caching closure at least holds a reference count and closure
      callback __cache_set_unregister only being called by bch_cache_set_stop
      which using closure_queue(&c->caching), that means c->caching closure
      callback for destroying cache_set won't be trigger by previous
      closure_put(&d->c->caching).
      So at this stage(while cached_dev_detach_finish is calling) it's safe to
      access cache_set dc->disk.c.
      
      Fixes: 0259d449 ("bcache: move calc_cached_dev_sectors to proper place on backing device detach")
      Signed-off-by: NLin Feng <linf@wangsu.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20211112053629.3437-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
      aa97f6cd
  7. 03 11月, 2021 1 次提交
  8. 21 10月, 2021 1 次提交
  9. 20 10月, 2021 5 次提交
  10. 19 10月, 2021 1 次提交
  11. 13 8月, 2021 2 次提交
  12. 01 6月, 2021 1 次提交
  13. 07 5月, 2021 1 次提交
  14. 11 4月, 2021 1 次提交
  15. 11 3月, 2021 1 次提交
  16. 10 2月, 2021 3 次提交
    • K
      bcache: Move journal work to new flush wq · afe78ab4
      Kai Krakow 提交于
      This is potentially long running and not latency sensitive, let's get
      it out of the way of other latency sensitive events.
      
      As observed in the previous commit, the `system_wq` comes easily
      congested by bcache, and this fixes a few more stalls I was observing
      every once in a while.
      
      Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
      of boot and file system operations in my tests. Also, without
      `WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
      previous behavior as `system_wq` also does no memory reclaim:
      
      > // workqueue.c:
      > system_wq = alloc_workqueue("events", 0, 0);
      
      Cc: Coly Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.4+
      Signed-off-by: NKai Krakow <kai@kaishome.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      afe78ab4
    • K
      Revert "bcache: Kill btree_io_wq" · 9f233ffe
      Kai Krakow 提交于
      This reverts commit 56b30770.
      
      With the btree using the `system_wq`, I seem to see a lot more desktop
      latency than I should.
      
      After some more investigation, it looks like the original assumption
      of 56b30770 no longer is true, and bcache has a very high potential of
      congesting the `system_wq`. In turn, this introduces laggy desktop
      performance, IO stalls (at least with btrfs), and input events may be
      delayed.
      
      So let's revert this. It's important to note that the semantics of
      using `system_wq` previously mean that `btree_io_wq` should be created
      before and destroyed after other bcache wqs to keep the same
      assumptions.
      
      Cc: Coly Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.4+
      Signed-off-by: NKai Krakow <kai@kaishome.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9f233ffe
    • K
      bcache: Fix register_device_aync typo · d7fae7b4
      Kai Krakow 提交于
      Should be `register_device_async`.
      
      Cc: Coly Li <colyli@suse.de>
      Signed-off-by: NKai Krakow <kai@kaishome.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d7fae7b4
  17. 25 1月, 2021 1 次提交
  18. 10 1月, 2021 4 次提交
    • C
      bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET · 5342fd42
      Coly Li 提交于
      If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
      set, it means the cache device is created with obsoleted layout with
      obso_bucket_site_hi. Now bcache does not support this feature bit, a new
      BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
      for a better layout to support large bucket size.
      
      For the legacy compatibility purpose, if a cache device created with
      obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
      devices attached to this cache set should be set to read-only. Then the
      dirty data can be written back to backing device before re-create the
      cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
      by the latest bcache-tools.
      
      This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
      when running a cache set and attach a bcache device to the cache set. If
      this bit is set,
      - When run a cache set, print an error kernel message to indicate all
        following attached bcache device will be read-only.
      - When attach a bcache device, print an error kernel message to indicate
        the attached bcache device will be read-only, and ask users to update
        to latest bcache-tools.
      
      Such change is only for cache device whose bucket size >= 32MB, this is
      for the zoned SSD and almost nobody uses such large bucket size at this
      moment. If you don't explicit set a large bucket size for a zoned SSD,
      such change is totally transparent to your bcache device.
      
      Fixes: ffa47032 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5342fd42
    • C
      bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket · b16671e8
      Coly Li 提交于
      When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
      was introduced into the incompat feature set. It used bucket_size_hi
      (which was added at the tail of struct cache_sb_disk) to extend current
      16bit bucket size to 32bit with existing bucket_size in struct
      cache_sb_disk.
      
      This is not a good idea, there are two obvious problems,
      - Bucket size is always value power of 2, if store log2(bucket size) in
        existing bucket_size of struct cache_sb_disk, it is unnecessary to add
        bucket_size_hi.
      - Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
        struct cache_sb_disk, bucket_size_hi was added after d[] which makes
        csum_set calculate an unexpected super block checksum.
      
      To fix the above problems, this patch introduces a new incompat feature
      bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
      means bucket_size in struct cache_sb_disk stores the order of power-of-2
      bucket size value. When user specifies a bucket size larger than 32768
      sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
      incompat feature set, and bucket_size stores log2(bucket size) more
      than store the real bucket size value.
      
      The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
      it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
      recognized by kernel driver for legacy compatible purpose. The previous
      bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
      and not used in bcache-tools anymore.
      
      For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
      bcache-tools and kernel driver still recognize the feature string and
      display it as "obso_large_bucket".
      
      With this change, the unnecessary extra space extend of bcache on-disk
      super block can be avoided, and csum_set() may generate expected check
      sum as well.
      
      Fixes: ffa47032 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b16671e8
    • C
      bcache: check unsupported feature sets for bcache register · 1dfc0686
      Coly Li 提交于
      This patch adds the check for features which is incompatible for
      current supported feature sets.
      
      Now if the bcache device created by bcache-tools has features that
      current kernel doesn't support, read_super() will fail with error
      messoage. E.g. if an unsupported incompatible feature detected,
      bcache register will fail with dmesg "bcache: register_bcache() error :
      Unsupported incompatible feature found".
      
      Fixes: d721a43f ("bcache: increase super block version for cache device and backing device")
      Fixes: ffa47032 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1dfc0686
    • Y
      bcache: set pdev_set_uuid before scond loop iteration · e8092707
      Yi Li 提交于
      There is no need to reassign pdev_set_uuid in the second loop iteration,
      so move it to the place before second loop.
      Signed-off-by: NYi Li <yili@winhong.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e8092707
  19. 24 12月, 2020 1 次提交
    • Y
      bcache:remove a superfluous check in register_bcache · 117ae250
      Yi Li 提交于
      There have no reassign the bdev after check It is IS_ERR.
      the double check !IS_ERR(bdev) is superfluous.
      
      After commit 4e7b5671 ("block: remove i_bdev"),
      "Switch the block device lookup interfaces to directly work with a dev_t
      so that struct block_device references are only acquired by the
      blkdev_get variants (and the blk-cgroup special case).  This means that
      we now don't need an extra reference in the inode and can generally
      simplify handling of struct block_device to keep the lookups contained
      in the core block layer code."
      
      so after lookup_bdev call, there no need to do bdput.
      
      remove a superfluous check the bdev & don't call bdput after lookup_bdev.
      
      Fixes: 4e7b5671("block: remove i_bdev")
      Signed-off-by: NYi Li <yili@winhong.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      117ae250
  20. 08 12月, 2020 1 次提交
    • D
      bcache: fix race between setting bdev state to none and new write request direct to backing · df4ad532
      Dongsheng Yang 提交于
      There is a race condition in detaching as below:
      A. detaching			B. Write request
      (1) writing back
      (2) write back done, set bdev
          state to clean.
      (3) cached_dev_put() and
          schedule_work(&dc->detach);
      				(4) write data [0 - 4K] directly
      				    into backing and ack to user.
      (5) power-failure...
      
      When we restart this bcache device, this bdev is clean but not detached,
      and read [0 - 4K], we will get unexpected old data from cache device.
      
      To fix this problem, set the bdev state to none when we writeback done
      in detaching, and then if power-failure happened as above, the data in
      cache will not be used in next bcache device starting, it's detached, we
      will read the correct data from backing derectly.
      Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      df4ad532
  21. 02 12月, 2020 3 次提交
  22. 03 10月, 2020 5 次提交
    • C
      bcache: remove embedded struct cache_sb from struct cache_set · 4a784266
      Coly Li 提交于
      Since bcache code was merged into mainline kerrnel, each cache set only
      as one single cache in it. The multiple caches framework is here but the
      code is far from completed. Considering the multiple copies of cached
      data can also be stored on e.g. md raid1 devices, it is unnecessary to
      support multiple caches in one cache set indeed.
      
      The previous preparation patches fix the dependencies of explicitly
      making a cache set only have single cache. Now we don't have to maintain
      an embedded partial super block in struct cache_set, the in-memory super
      block can be directly referenced from struct cache.
      
      This patch removes the embedded struct cache_sb from struct cache_set,
      and fixes all locations where the superb lock was referenced from this
      removed super block by referencing the in-memory super block of struct
      cache.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4a784266
    • C
      bcache: check and set sync status on cache's in-memory super block · 6f9414e0
      Coly Li 提交于
      Currently the cache's sync status is checked and set on cache set's in-
      memory partial super block. After removing the embedded struct cache_sb
      from cache set and reference cache's in-memory super block from struct
      cache_set, the sync status can set and check directly on cache's super
      block.
      
      This patch checks and sets the cache sync status directly on cache's
      in-memory super block. This is a preparation for later removing embedded
      struct cache_sb from struct cache_set.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6f9414e0
    • C
      bcache: remove can_attach_cache() · ebaa1ac1
      Coly Li 提交于
      After removing the embedded struct cache_sb from struct cache_set, cache
      set will directly reference the in-memory super block of struct cache.
      It is unnecessary to compare block_size, bucket_size and nr_in_set from
      the identical in-memory super block in can_attach_cache().
      
      This is a preparation patch for latter removing cache_set->sb from
      struct cache_set.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ebaa1ac1
    • C
      bcache: don't check seq numbers in register_cache_set() · 08a17828
      Coly Li 提交于
      In order to update the partial super block of cache set, the seq numbers
      of cache and cache set are checked in register_cache_set(). If cache's
      seq number is larger than cache set's seq number, cache set must update
      its partial super block from cache's super block. It is unncessary when
      the embedded struct cache_sb is removed from struct cache set.
      
      This patch removed the seq numbers checking from register_cache_set(),
      because later there will be no such partial super block in struct cache
      set, the cache set will directly reference in-memory super block from
      struct cache. This is a preparation patch for removing embedded struct
      cache_sb from struct cache_set.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08a17828
    • C
      bcache: remove useless alloc_bucket_pages() · 421cf1c5
      Coly Li 提交于
      Now no one uses alloc_bucket_pages() anymore, remove it from bcache.h.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      421cf1c5