1. 25 3月, 2020 9 次提交
  2. 24 3月, 2020 6 次提交
  3. 04 3月, 2020 2 次提交
    • M
      dm: bump version of core and various targets · 636be424
      Mike Snitzer 提交于
      Changes made during the 5.6 cycle warrant bumping the version number
      for DM core and the targets modified by this commit.
      
      It should be noted that dm-thin, dm-crypt and dm-raid already had
      their target version bumped during the 5.6 merge window.
      
      Signed-off-by; Mike Snitzer <snitzer@redhat.com>
      636be424
    • H
      dm: fix congested_fn for request-based device · 974f51e8
      Hou Tao 提交于
      We neither assign congested_fn for requested-based blk-mq device nor
      implement it correctly. So fix both.
      
      Also, remove incorrect comment from dm_init_normal_md_queue and rename
      it to dm_init_congested_fn.
      
      Fixes: 4aa9c692 ("bdi: separate out congested state into a separate struct")
      Cc: stable@vger.kernel.org
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      974f51e8
  4. 03 3月, 2020 3 次提交
  5. 28 2月, 2020 5 次提交
    • S
      dm zoned: Fix reference counter initial value of chunk works · ee63634b
      Shin'ichiro Kawasaki 提交于
      Dm-zoned initializes reference counters of new chunk works with zero
      value and refcount_inc() is called to increment the counter. However, the
      refcount_inc() function handles the addition to zero value as an error
      and triggers the warning as follows:
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0
      ...
      CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134
      ...
      Call Trace:
       dmz_map+0x2d2/0x350 [dm_zoned]
       __map_bio+0x42/0x1a0
       __split_and_process_non_flush+0x14a/0x1b0
       __split_and_process_bio+0x83/0x240
       ? kmem_cache_alloc+0x165/0x220
       dm_process_bio+0x90/0x230
       ? generic_make_request_checks+0x2e7/0x680
       dm_make_request+0x3e/0xb0
       generic_make_request+0xcf/0x320
       ? memcg_drain_all_list_lrus+0x1c0/0x1c0
       submit_bio+0x3c/0x160
       ? guard_bio_eod+0x2c/0x130
       mpage_readpages+0x182/0x1d0
       ? bdev_evict_inode+0xf0/0xf0
       read_pages+0x6b/0x1b0
       __do_page_cache_readahead+0x1ba/0x1d0
       force_page_cache_readahead+0x93/0x100
       generic_file_read_iter+0x83a/0xe40
       ? __seccomp_filter+0x7b/0x670
       new_sync_read+0x12a/0x1c0
       vfs_read+0x9d/0x150
       ksys_read+0x5f/0xe0
       do_syscall_64+0x5b/0x180
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ...
      
      After this warning, following refcount API calls for the counter all fail
      to change the counter value.
      
      Fix this by setting the initial reference counter value not zero but one
      for the new chunk works. Instead, do not call refcount_inc() via
      dmz_get_chunk_work() for the new chunks works.
      
      The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL
      enabled. Refcount rework was merged to linux version 5.5 by the
      commit 168829ad ("Merge branch 'locking-core-for-linus' of
      git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this
      commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed
      regardless of kernel configuration.
      
      Linux version 4.20 merged the commit 092b5648 ("dm zoned: target: use
      refcount_t for dm zoned reference counters"). Before this commit, dm
      zoned used atomic_t APIs which does not check addition to zero, then this
      fix is not necessary.
      
      Fixes: 092b5648 ("dm zoned: target: use refcount_t for dm zoned reference counters")
      Cc: stable@vger.kernel.org # 5.4+
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ee63634b
    • M
      dm writecache: verify watermark during resume · 41c526c5
      Mikulas Patocka 提交于
      Verify the watermark upon resume - so that if the target is reloaded
      with lower watermark, it will start the cleanup process immediately.
      
      Fixes: 48debafe ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      41c526c5
    • M
      dm: report suspended device during destroy · adc0daad
      Mikulas Patocka 提交于
      The function dm_suspended returns true if the target is suspended.
      However, when the target is being suspended during unload, it returns
      false.
      
      An example where this is a problem: the test "!dm_suspended(wc->ti)" in
      writecache_writeback is not sufficient, because dm_suspended returns
      zero while writecache_suspend is in progress.  As is, without an
      enhanced dm_suspended, simply switching from flush_workqueue to
      drain_workqueue still emits warnings:
      workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries
      
      writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function
      flushes the current work. However, the workqueue may re-queue itself and
      flush_workqueue doesn't wait for re-queued works to finish. Because of
      this - the function writecache_writeback continues execution after the
      device was suspended and then concurrently with writecache_dtr, causing
      a crash in writecache_writeback.
      
      We must use drain_workqueue - that waits until the work and all re-queued
      works finish.
      
      As a prereq for switching to drain_workqueue, this commit fixes
      dm_suspended to return true after the presuspend hook and before the
      postsuspend hook - just like during a normal suspend. It allows
      simplifying the dm-integrity and dm-writecache targets so that they
      don't have to maintain suspended flags on their own.
      
      With this change use of drain_workqueue() can be used effectively.  This
      change was tested with the lvm2 testsuite and cryptsetup testsuite and
      the are no regressions.
      
      Fixes: 48debafe ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Reported-by: NCorey Marthaler <cmarthal@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      adc0daad
    • T
      dm thin metadata: fix lockdep complaint · 3918e066
      Theodore Ts'o 提交于
      [ 3934.173244] ======================================================
      [ 3934.179572] WARNING: possible circular locking dependency detected
      [ 3934.185884] 5.4.21-xfstests #1 Not tainted
      [ 3934.190151] ------------------------------------------------------
      [ 3934.196673] dmsetup/8897 is trying to acquire lock:
      [ 3934.201688] ffffffffbce82b18 (shrinker_rwsem){++++}, at: unregister_shrinker+0x22/0x80
      [ 3934.210268]
                     but task is already holding lock:
      [ 3934.216489] ffff92a10cc5e1d0 (&pmd->root_lock){++++}, at: dm_pool_metadata_close+0xba/0x120
      [ 3934.225083]
                     which lock already depends on the new lock.
      
      [ 3934.564165] Chain exists of:
                       shrinker_rwsem --> &journal->j_checkpoint_mutex --> &pmd->root_lock
      
      For a more detailed lockdep report, please see:
      
      	https://lore.kernel.org/r/20200220234519.GA620489@mit.edu
      
      We shouldn't need to hold the lock while are just tearing down and
      freeing the whole metadata pool structure.
      
      Fixes: 44d8ebf4 ("dm thin metadata: use pool locking at end of dm_pool_metadata_close")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3918e066
    • M
      dm cache: fix a crash due to incorrect work item cancelling · 7cdf6a0a
      Mikulas Patocka 提交于
      The crash can be reproduced by running the lvm2 testsuite test
      lvconvert-thin-external-cache.sh for several minutes, e.g.:
        while :; do make check T=shell/lvconvert-thin-external-cache.sh; done
      
      The crash happens in this call chain:
      do_waker -> policy_tick -> smq_tick -> end_hotspot_period -> clear_bitset
      -> memset -> __memset -- which accesses an invalid pointer in the vmalloc
      area.
      
      The work entry on the workqueue is executed even after the bitmap was
      freed. The problem is that cancel_delayed_work doesn't wait for the
      running work item to finish, so the work item can continue running and
      re-submitting itself even after cache_postsuspend. In order to make sure
      that the work item won't be running, we must use cancel_delayed_work_sync.
      
      Also, change flush_workqueue to drain_workqueue, so that if some work item
      submits itself or another work item, we are properly waiting for both of
      them.
      
      Fixes: c6b4fcba ("dm: add cache target")
      Cc: stable@vger.kernel.org # v3.9
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7cdf6a0a
  6. 26 2月, 2020 3 次提交
  7. 13 2月, 2020 3 次提交
    • C
      bcache: remove macro nr_to_fifo_front() · 4ec31cb6
      Coly Li 提交于
      Macro nr_to_fifo_front() is only used once in btree_flush_write(),
      it is unncessary indeed. This patch removes this macro and does
      calculation directly in place.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ec31cb6
    • C
      bcache: Revert "bcache: shrink btree node cache after bch_btree_check()" · 309cc719
      Coly Li 提交于
      This reverts commit 1df3877f.
      
      In my testing, sometimes even all the cached btree nodes are freed,
      creating gc and allocator kernel threads may still fail. Finally it
      turns out that kthread_run() may fail if there is pending signal for
      current task. And the pending signal is sent from OOM killer which
      is triggered by memory consuption in bch_btree_check().
      
      Therefore explicitly shrinking bcache btree node here does not help,
      and after the shrinker callback is improved, as well as pending signals
      are ignored before creating kernel threads, now such operation is
      unncessary anymore.
      
      This patch reverts the commit 1df3877f ("bcache: shrink btree node
      cache after bch_btree_check()") because we have better improvement now.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      309cc719
    • C
      bcache: ignore pending signals when creating gc and allocator thread · 0b96da63
      Coly Li 提交于
      When run a cache set, all the bcache btree node of this cache set will
      be checked by bch_btree_check(). If the bcache btree is very large,
      iterating all the btree nodes will occupy too much system memory and
      the bcache registering process might be selected and killed by system
      OOM killer. kthread_run() will fail if current process has pending
      signal, therefore the kthread creating in run_cache_set() for gc and
      allocator kernel threads are very probably failed for a very large
      bcache btree.
      
      Indeed such OOM is safe and the registering process will exit after
      the registration done. Therefore this patch flushes pending signals
      during the cache set start up, specificly in bch_cache_allocator_start()
      and bch_gc_thread_start(), to make sure run_cache_set() won't fail for
      large cahced data set.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b96da63
  8. 04 2月, 2020 1 次提交
  9. 03 2月, 2020 1 次提交
    • C
      fs: Enable bmap() function to properly return errors · 30460e1e
      Carlos Maiolino 提交于
      By now, bmap() will either return the physical block number related to
      the requested file offset or 0 in case of error or the requested offset
      maps into a hole.
      This patch makes the needed changes to enable bmap() to proper return
      errors, using the return value as an error return, and now, a pointer
      must be passed to bmap() to be filled with the mapped physical block.
      
      It will change the behavior of bmap() on return:
      
      - negative value in case of error
      - zero on success or map fell into a hole
      
      In case of a hole, the *block will be zero too
      
      Since this is a prep patch, by now, the only error return is -EINVAL if
      ->bmap doesn't exist.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      30460e1e
  10. 01 2月, 2020 5 次提交
    • C
      bcache: check return value of prio_read() · 49d08d59
      Coly Li 提交于
      Now if prio_read() failed during starting a cache set, we can print
      out error message in run_cache_set() and handle the failure properly.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      49d08d59
    • C
      bcache: fix incorrect data type usage in btree_flush_write() · d1c3cc34
      Coly Li 提交于
      Dan Carpenter points out that from commit 2aa8c529 ("bcache: avoid
      unnecessary btree nodes flushing in btree_flush_write()"), there is a
      incorrect data type usage which leads to the following static checker
      warning:
      	drivers/md/bcache/journal.c:444 btree_flush_write()
      	warn: 'ref_nr' unsigned <= 0
      
      drivers/md/bcache/journal.c
         422  static void btree_flush_write(struct cache_set *c)
         423  {
         424          struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR];
         425          unsigned int i, nr, ref_nr;
                                          ^^^^^^
      
         426          atomic_t *fifo_front_p, *now_fifo_front_p;
         427          size_t mask;
         428
         429          if (c->journal.btree_flushing)
         430                  return;
         431
         432          spin_lock(&c->journal.flush_write_lock);
         433          if (c->journal.btree_flushing) {
         434                  spin_unlock(&c->journal.flush_write_lock);
         435                  return;
         436          }
         437          c->journal.btree_flushing = true;
         438          spin_unlock(&c->journal.flush_write_lock);
         439
         440          /* get the oldest journal entry and check its refcount */
         441          spin_lock(&c->journal.lock);
         442          fifo_front_p = &fifo_front(&c->journal.pin);
         443          ref_nr = atomic_read(fifo_front_p);
         444          if (ref_nr <= 0) {
                          ^^^^^^^^^^^
      Unsigned can't be less than zero.
      
         445                  /*
         446                   * do nothing if no btree node references
         447                   * the oldest journal entry
         448                   */
         449                  spin_unlock(&c->journal.lock);
         450                  goto out;
         451          }
         452          spin_unlock(&c->journal.lock);
      
      As the warning information indicates, local varaible ref_nr in unsigned
      int type is wrong, which does not matche atomic_read() and the "<= 0"
      checking.
      
      This patch fixes the above error by defining local variable ref_nr as
      int type.
      
      Fixes: 2aa8c529 ("bcache: avoid unnecessary btree nodes flushing in btree_flush_write()")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d1c3cc34
    • C
      bcache: add readahead cache policy options via sysfs interface · 038ba8cc
      Coly Li 提交于
      In year 2007 high performance SSD was still expensive, in order to
      save more space for real workload or meta data, the readahead I/Os
      for non-meta data was bypassed and not cached on SSD.
      
      In now days, SSD price drops a lot and people can find larger size
      SSD with more comfortable price. It is unncessary to alway bypass
      normal readahead I/Os to save SSD space for now.
      
      This patch adds options for readahead data cache policies via sysfs
      file /sys/block/bcache<N>/readahead_cache_policy, the options are,
      - "all": cache all readahead data I/Os.
      - "meta-only": only cache meta data, and bypass other regular I/Os.
      
      If users want to make bcache continue to only cache readahead request
      for metadata and bypass regular data readahead, please set "meta-only"
      to this sysfs file. By default, bcache will back to cache all read-
      ahead requests now.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NColy Li <colyli@suse.de>
      Acked-by: NEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      038ba8cc
    • C
      bcache: explicity type cast in bset_bkey_last() · 7c02b005
      Coly Li 提交于
      In bset.h, macro bset_bkey_last() is defined as,
          bkey_idx((struct bkey *) (i)->d, (i)->keys)
      
      Parameter i can be variable type of data structure, the macro always
      works once the type of struct i has member 'd' and 'keys'.
      
      bset_bkey_last() is also used in macro csum_set() to calculate the
      checksum of a on-disk data structure. When csum_set() is used to
      calculate checksum of on-disk bcache super block, the parameter 'i'
      data type is struct cache_sb_disk. Inside struct cache_sb_disk (also in
      struct cache_sb) the member keys is __u16 type. But bkey_idx() expects
      unsigned int (a 32bit width), so there is problem when sending
      parameters via stack to call bkey_idx().
      
      Sparse tool from Intel 0day kbuild system reports this incompatible
      problem. bkey_idx() is part of user space API, so the simplest fix is
      to cast the (i)->keys to unsigned int type in macro bset_bkey_last().
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7c02b005
    • C
      bcache: fix memory corruption in bch_cache_accounting_clear() · 5bebf748
      Coly Li 提交于
      Commit 83ff9318 ("bcache: not use hard coded memset size in
      bch_cache_accounting_clear()") tries to make the code more easy to
      understand by removing the hard coded number with following change,
      	void bch_cache_accounting_clear(...)
      	{
      		memset(&acc->total.cache_hits,
      			0,
      	-		sizeof(unsigned long) * 7);
      	+		sizeof(struct cache_stats));
      	}
      
      Unfortunately the change was wrong (it also tells us the original code
      was not easy to correctly understand). The hard coded number 7 is used
      because in struct cache_stats,
       15 struct cache_stats {
       16         struct kobject          kobj;
       17
       18         unsigned long cache_hits;
       19         unsigned long cache_misses;
       20         unsigned long cache_bypass_hits;
       21         unsigned long cache_bypass_misses;
       22         unsigned long cache_readaheads;
       23         unsigned long cache_miss_collisions;
       24         unsigned long sectors_bypassed;
       25
       26         unsigned int            rescale;
       27 };
      only members in LINE 18-24 want to be set to 0. It is wrong to use
      'sizeof(struct cache_stats)' to replace 'sizeof(unsigned long) * 7), the
      memory objects behind acc->total is staled by this change.
      
      Сорокин Артем Сергеевич reports that by the following steps, kernel
      panic will be triggered,
      1. Create new set: make-bcache -B /dev/nvme1n1 -C /dev/sda --wipe-bcache
      2. Run in /sys/fs/bcache/<uuid>:
         echo 1 > clear_stats && cat stats_five_minute/cache_bypass_hits
      
      I can reproduce the panic and get following dmesg with KASAN enabled,
      [22613.172742] ==================================================================
      [22613.172862] BUG: KASAN: null-ptr-deref in sysfs_kf_seq_show+0x117/0x230
      [22613.172864] Read of size 8 at addr 0000000000000000 by task cat/6753
      
      [22613.172870] CPU: 1 PID: 6753 Comm: cat Not tainted 5.5.0-rc7-lp151.28.16-default+ #11
      [22613.172872] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
      [22613.172873] Call Trace:
      [22613.172964]  dump_stack+0x8b/0xbb
      [22613.172968]  ? sysfs_kf_seq_show+0x117/0x230
      [22613.172970]  ? sysfs_kf_seq_show+0x117/0x230
      [22613.173031]  __kasan_report+0x176/0x192
      [22613.173064]  ? pr_cont_kernfs_name+0x40/0x60
      [22613.173067]  ? sysfs_kf_seq_show+0x117/0x230
      [22613.173070]  kasan_report+0xe/0x20
      [22613.173072]  sysfs_kf_seq_show+0x117/0x230
      [22613.173105]  seq_read+0x199/0x6d0
      [22613.173110]  vfs_read+0xa5/0x1a0
      [22613.173113]  ksys_read+0x110/0x160
      [22613.173115]  ? kernel_write+0xb0/0xb0
      [22613.173177]  do_syscall_64+0x77/0x290
      [22613.173238]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [22613.173241] RIP: 0033:0x7fc2c886ac61
      [22613.173244] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89
      [22613.173245] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [22613.173248] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61
      [22613.173249] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003
      [22613.173250] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000
      [22613.173251] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000
      [22613.173253] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000
      [22613.173255] ==================================================================
      [22613.173256] Disabling lock debugging due to kernel taint
      [22613.173350] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [22613.178380] #PF: supervisor read access in kernel mode
      [22613.180959] #PF: error_code(0x0000) - not-present page
      [22613.183444] PGD 0 P4D 0
      [22613.184867] Oops: 0000 [#1] SMP KASAN PTI
      [22613.186797] CPU: 1 PID: 6753 Comm: cat Tainted: G    B             5.5.0-rc7-lp151.28.16-default+ #11
      [22613.191253] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
      [22613.196706] RIP: 0010:sysfs_kf_seq_show+0x117/0x230
      [22613.199097] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48
      [22613.208016] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246
      [22613.210448] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6
      [22613.213691] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
      [22613.216893] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd
      [22613.220075] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200
      [22613.223256] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000
      [22613.226290] FS:  00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000
      [22613.229637] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [22613.231993] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0
      [22613.234909] Call Trace:
      [22613.235931]  seq_read+0x199/0x6d0
      [22613.237259]  vfs_read+0xa5/0x1a0
      [22613.239229]  ksys_read+0x110/0x160
      [22613.240590]  ? kernel_write+0xb0/0xb0
      [22613.242040]  do_syscall_64+0x77/0x290
      [22613.243625]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [22613.245450] RIP: 0033:0x7fc2c886ac61
      [22613.246706] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89
      [22613.253296] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [22613.255835] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61
      [22613.258472] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003
      [22613.260807] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000
      [22613.263188] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000
      [22613.265598] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000
      [22613.268729] Modules linked in: scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock fuse bnep kvm_intel kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel snd_ens1371 snd_ac97_codec ac97_bus bcache snd_pcm btusb btrtl btbcm btintel crc64 aesni_intel glue_helper crypto_simd vmw_balloon cryptd bluetooth snd_timer snd_rawmidi snd joydev pcspkr e1000 rfkill vmw_vmci soundcore ecdh_generic ecc gameport i2c_piix4 mptctl ac button hid_generic usbhid sr_mod cdrom ata_generic ehci_pci vmwgfx uhci_hcd drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm ehci_hcd mptspi scsi_transport_spi mptscsih ata_piix mptbase ahci usbcore libahci drm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
      [22613.292429] CR2: 0000000000000000
      [22613.293563] ---[ end trace a074b26a8508f378 ]---
      [22613.295138] RIP: 0010:sysfs_kf_seq_show+0x117/0x230
      [22613.296769] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48
      [22613.303553] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246
      [22613.305280] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6
      [22613.307924] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
      [22613.310272] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd
      [22613.312685] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200
      [22613.315076] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000
      [22613.318116] FS:  00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000
      [22613.320743] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [22613.322628] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0
      
      Here this patch fixes the following problem by explicity set all the 7
      members to 0 in bch_cache_accounting_clear().
      Reported-by: NСорокин Артем Сергеевич <a.sorokin@bank-hlynov.ru>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5bebf748
  11. 28 1月, 2020 1 次提交
  12. 24 1月, 2020 1 次提交
    • C
      bcache: reap from tail of c->btree_cache in bch_mca_scan() · e3de0446
      Coly Li 提交于
      When shrink btree node cache from c->btree_cache in bch_mca_scan(),
      no matter the selected node is reaped or not, it will be rotated from
      the head to the tail of c->btree_cache list. But in bcache journal
      code, when flushing the btree nodes with oldest journal entry, btree
      nodes are iterated and slected from the tail of c->btree_cache list in
      btree_flush_write(). The list_rotate_left() in bch_mca_scan() will
      make btree_flush_write() iterate more nodes in c->btree_list in reverse
      order.
      
      This patch just reaps the selected btree node cache, and not move it
      from the head to the tail of c->btree_cache list. Then bch_mca_scan()
      will not mess up c->btree_cache list to btree_flush_write().
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e3de0446