1. 28 6月, 2019 20 次提交
    • C
      bcache: fix mistaken sysfs entry for io_error counter · 54619998
      Coly Li 提交于
      In bch_cached_dev_files[] from driver/md/bcache/sysfs.c, sysfs_errors is
      incorrectly inserted in. The correct entry should be sysfs_io_errors.
      
      This patch fixes the problem and now I/O errors of cached device can be
      read from /sys/block/bcache<N>/bcache/io_errors.
      
      Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      54619998
    • C
      bcache: add pendings_cleanup to stop pending bcache device · 0c277e21
      Coly Li 提交于
      If a bcache device is in dirty state and its cache set is not
      registered, this bcache device will not appear in /dev/bcache<N>,
      and there is no way to stop it or remove the bcache kernel module.
      
      This is an as-designed behavior, but sometimes people has to reboot
      whole system to release or stop the pending backing device.
      
      This sysfs interface may remove such pending bcache devices when
      write anything into the sysfs file manually.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0c277e21
    • C
      bcache: make bset_search_tree() be more understandable · 944a4f34
      Coly Li 提交于
      The purpose of following code in bset_search_tree() is to avoid a branch
      instruction,
       994         if (likely(f->exponent != 127))
       995                 n = j * 2 + (((unsigned int)
       996                               (f->mantissa -
       997                                bfloat_mantissa(search, f))) >> 31);
       998         else
       999                 n = (bkey_cmp(tree_to_bkey(t, j), search) > 0)
      1000                         ? j * 2
      1001                         : j * 2 + 1;
      
      This piece of code is not very clear to understand, even when I tried to
      add code comment for it, I made mistake. This patch removes the implict
      bit operation and uses explicit branch to calculate next location in
      binary tree search.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      944a4f34
    • C
      bcache: remove "XXX:" comment line from run_cache_set() · 68a53c95
      Coly Li 提交于
      In previous bcache patches for Linux v5.2, the failure code path of
      run_cache_set() is tested and fixed. So now the following comment
      line can be removed from run_cache_set(),
      	/* XXX: test this, it's broken */
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      68a53c95
    • C
      bcache: improve error message in bch_cached_dev_run() · e0faa3d7
      Coly Li 提交于
      This patch adds more error message in bch_cached_dev_run() to indicate
      the exact reason why an error value is returned. Please notice when
      printing out the "is running already" message, pr_info() is used here,
      because in this case also -EBUSY is returned, the bcache device can
      continue to attach to the cache devince and run, so it won't be an
      error level message in kernel message.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e0faa3d7
    • C
      bcache: add more error message in bch_cached_dev_attach() · 633bb2ce
      Coly Li 提交于
      This patch adds more error message for attaching cached device, this is
      helpful to debug code failure during bache device start up.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      633bb2ce
    • C
      bcache: more detailed error message to bcache_device_link() · 4b6efb4b
      Coly Li 提交于
      This patch adds more accurate error message for specific
      ssyfs_create_link() call, to help debugging failure during
      bcache device start tup.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4b6efb4b
    • C
      bcache: check CACHE_SET_IO_DISABLE bit in bch_journal() · 383ff218
      Coly Li 提交于
      When too many I/O errors happen on cache set and CACHE_SET_IO_DISABLE
      bit is set, bch_journal() may continue to work because the journaling
      bkey might be still in write set yet. The caller of bch_journal() may
      believe the journal still work but the truth is in-memory journal write
      set won't be written into cache device any more. This behavior may
      introduce potential inconsistent metadata status.
      
      This patch checks CACHE_SET_IO_DISABLE bit at the head of bch_journal(),
      if the bit is set, bch_journal() returns NULL immediately to notice
      caller to know journal does not work.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      383ff218
    • C
      bcache: check CACHE_SET_IO_DISABLE in allocator code · e775339e
      Coly Li 提交于
      If CACHE_SET_IO_DISABLE of a cache set flag is set by too many I/O
      errors, currently allocator routines can still continue allocate
      space which may introduce inconsistent metadata state.
      
      This patch checkes CACHE_SET_IO_DISABLE bit in following allocator
      routines,
      - bch_bucket_alloc()
      - __bch_bucket_alloc_set()
      Once CACHE_SET_IO_DISABLE is set on cache set, the allocator routines
      may reject allocation request earlier to avoid potential inconsistent
      metadata.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e775339e
    • C
      bcache: remove unncessary code in bch_btree_keys_init() · bd9026c8
      Coly Li 提交于
      Function bch_btree_keys_init() initializes b->set[].size and
      b->set[].data to zero. As the code comments indicates, these code indeed
      is unncessary, because both struct btree_keys and struct bset_tree are
      nested embedded into struct btree, when struct btree is filled with 0
      bits by kzalloc() in mca_bucket_alloc(), b->set[].size and
      b->set[].data are initialized to 0 (a.k.a NULL) already.
      
      This patch removes the redundant code, and add comments in
      bch_btree_keys_init() and mca_bucket_alloc() to explain why it's safe.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bd9026c8
    • C
      bcache: add return value check to bch_cached_dev_run() · 0b13efec
      Coly Li 提交于
      This patch adds return value check to bch_cached_dev_run(), now if there
      is error happens inside bch_cached_dev_run(), it can be catched.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b13efec
    • A
      bcache: use sysfs_match_string() instead of __sysfs_match_string() · 89e0341a
      Alexandru Ardelean 提交于
      The arrays (of strings) that are passed to __sysfs_match_string() are
      static, so use sysfs_match_string() which does an implicit ARRAY_SIZE()
      over these arrays.
      
      Functionally, this doesn't change anything.
      The change is more cosmetic.
      
      It only shrinks the static arrays by 1 byte each.
      Signed-off-by: NAlexandru Ardelean <alexandru.ardelean@analog.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      89e0341a
    • C
      bcache: remove unnecessary prefetch() in bset_search_tree() · f960facb
      Coly Li 提交于
      In function bset_search_tree(), when p >= t->size, t->tree[0] will be
      prefetched by the following code piece,
       974                 unsigned int p = n << 4;
       975
       976                 p &= ((int) (p - t->size)) >> 31;
       977
       978                 prefetch(&t->tree[p]);
      
      The purpose of the above code is to avoid a branch instruction, but
      when p >= t->size, prefetch(&t->tree[0]) has no positive performance
      contribution at all. This patch avoids the unncessary prefetch by only
      calling prefetch() when p < t->size.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f960facb
    • C
      bcache: add io error counting in write_bdev_super_endio() · 08ec1e62
      Coly Li 提交于
      When backing device super block is written by bch_write_bdev_super(),
      the bio complete callback write_bdev_super_endio() simply ignores I/O
      status. Indeed such write request also contribute to backing device
      health status if the request failed.
      
      This patch checkes bio->bi_status in write_bdev_super_endio(), if there
      is error, bch_count_backing_io_errors() will be called to count an I/O
      error to dc->io_errors.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08ec1e62
    • C
      bcache: ignore read-ahead request failure on backing device · 578df99b
      Coly Li 提交于
      When md raid device (e.g. raid456) is used as backing device, read-ahead
      requests on a degrading and recovering md raid device might be failured
      immediately by md raid code, but indeed this md raid array can still be
      read or write for normal I/O requests. Therefore such failed read-ahead
      request are not real hardware failure. Further more, after degrading and
      recovering accomplished, read-ahead requests will be handled by md raid
      array again.
      
      For such condition, I/O failures of read-ahead requests don't indicate
      real health status (because normal I/O still be served), they should not
      be counted into I/O error counter dc->io_errors.
      
      Since there is no simple way to detect whether the backing divice is a
      md raid device, this patch simply ignores I/O failures for read-ahead
      bios on backing device, to avoid bogus backing device failure on a
      degrading md raid array.
      Suggested-and-tested-by: NThorsten Knabe <linux@thorsten-knabe.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      578df99b
    • C
      bcache: avoid flushing btree node in cache_set_flush() if io disabled · e6dcbd3e
      Coly Li 提交于
      When cache_set_flush() is called for too many I/O errors detected on
      cache device and the cache set is retiring, inside the function it
      doesn't make sense to flushing cached btree nodes from c->btree_cache
      because CACHE_SET_IO_DISABLE is set on c->flags already and all I/Os
      onto cache device will be rejected.
      
      This patch checks in cache_set_flush() that whether CACHE_SET_IO_DISABLE
      is set. If yes, then avoids to flush the cached btree nodes to reduce
      more time and make cache set retiring more faster.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6dcbd3e
    • C
      Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()" · 695277f1
      Coly Li 提交于
      This reverts commit 6147305c.
      
      Although this patch helps the failed bcache device to stop faster when
      too many I/O errors detected on corresponding cached device, setting
      CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This
      operation will disable all I/Os on cache set, which means other attached
      bcache devices won't work neither.
      
      Without this patch, the failed bcache device can also be stopped
      eventually if internal I/O accomplished (e.g. writeback). Therefore here
      I revert it.
      
      Fixes: 6147305c ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()")
      Reported-by: NYong Li <mr.liyong@qq.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      695277f1
    • C
      bcache: fix return value error in bch_journal_read() · 0ae49cb7
      Coly Li 提交于
      When everything is OK in bch_journal_read(), finally the return value
      is returned by,
      	return ret;
      which assumes ret will be 0 here. This assumption is wrong when all
      journal buckets as are full and filled with valid journal entries. In
      such cache the last location referencess read_bucket() sets 'ret' to
      1, which means new jset added into jset list. The jset list is list
      'journal' in caller run_cache_set().
      
      Return 1 to run_cache_set() means something wrong and the cache set
      won't start, but indeed everything is OK.
      
      This patch changes the line at end of bch_journal_read() to directly
      return 0 since everything if verything is good. Then a bogus error
      is fixed.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0ae49cb7
    • C
      bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() · b387e9b5
      Coly Li 提交于
      When system memory is in heavy pressure, bch_gc_thread_start() from
      run_cache_set() may fail due to out of memory. In such condition,
      c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following
      failure code path bch_cache_set_error(), when cache_set_flush() gets
      called, the code piece to stop c->gc_thread is broken,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      
      And KASAN catches such NULL pointer deference problem, with the warning
      information:
      
      [  561.207881] ==================================================================
      [  561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440
      [  561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313
      
      [  561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G        W         5.0.0-vanilla+ #3
      [  561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.207935] Workqueue: events cache_set_flush [bcache]
      [  561.207940] Call Trace:
      [  561.207948]  dump_stack+0x9a/0xeb
      [  561.207955]  ? kthread_stop+0x3b/0x440
      [  561.207960]  ? kthread_stop+0x3b/0x440
      [  561.207965]  kasan_report+0x176/0x192
      [  561.207973]  ? kthread_stop+0x3b/0x440
      [  561.207981]  kthread_stop+0x3b/0x440
      [  561.207995]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  561.208008]  process_one_work+0x856/0x1620
      [  561.208015]  ? find_held_lock+0x39/0x1d0
      [  561.208028]  ? drain_workqueue+0x380/0x380
      [  561.208048]  worker_thread+0x87/0xb80
      [  561.208058]  ? __kthread_parkme+0xb6/0x180
      [  561.208067]  ? process_one_work+0x1620/0x1620
      [  561.208072]  kthread+0x326/0x3e0
      [  561.208079]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  561.208090]  ret_from_fork+0x3a/0x50
      [  561.208110] ==================================================================
      [  561.208113] Disabling lock debugging due to kernel taint
      [  561.208115] irq event stamp: 11800231
      [  561.208126] hardirqs last  enabled at (11800231): [<ffffffff83008538>] do_syscall_64+0x18/0x410
      [  561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      [  561.208129] #PF error: [WRITE]
      [  561.312253] hardirqs last disabled at (11800230): [<ffffffff830052ff>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  561.312259] softirqs last  enabled at (11799832): [<ffffffff850005c7>] __do_softirq+0x5c7/0x8c3
      [  561.405975] PGD 0 P4D 0
      [  561.442494] softirqs last disabled at (11799821): [<ffffffff831add2c>] irq_exit+0x1ac/0x1e0
      [  561.791359] Oops: 0002 [#1] SMP KASAN NOPTI
      [  561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G    B   W         5.0.0-vanilla+ #3
      [  561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.791371] Workqueue: events cache_set_flush [bcache]
      [  561.791374] RIP: 0010:kthread_stop+0x3b/0x440
      [  561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 <f0> ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
      [  561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286
      [  561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314
      [  563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297
      [  563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d
      [  563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c
      [  563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68
      [  563.408620] FS:  0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000
      [  563.408622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0
      [  563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.915796] PKRU: 55555554
      [  563.915797] Call Trace:
      [  563.915807]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  563.915812]  process_one_work+0x856/0x1620
      [  564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.033563]  ? find_held_lock+0x39/0x1d0
      [  564.033567]  ? drain_workqueue+0x380/0x380
      [  564.033574]  worker_thread+0x87/0xb80
      [  564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.118042]  ? __kthread_parkme+0xb6/0x180
      [  564.118046]  ? process_one_work+0x1620/0x1620
      [  564.118048]  kthread+0x326/0x3e0
      [  564.118050]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.252441]  ret_from_fork+0x3a/0x50
      [  564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
      [  564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.348360] CR2: 000000000000001c
      [  564.348362] ---[ end trace b7f0e5cc7b2103b0 ]---
      
      Therefore, it is not enough to only check whether c->gc_thread is NULL,
      we should use IS_ERR_OR_NULL() to check both NULL pointer and error
      value.
      
      This patch changes the above buggy code piece in this way,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b387e9b5
    • C
      bcache: don't set max writeback rate if gc is running · 141df8bb
      Coly Li 提交于
      When gc is running, user space I/O processes may wait inside
      bcache code, so no new I/O coming. Indeed this is not a real idle
      time, maximum writeback rate should not be set in such situation.
      Otherwise a faster writeback thread may compete locks with gc thread
      and makes garbage collection slower, which results a longer I/O
      freeze period.
      
      This patch checks c->gc_mark_valid in set_at_max_writeback_rate(). If
      c->gc_mark_valid is 0 (gc running), set_at_max_writeback_rate() returns
      false, then update_writeback_rate() will not set writeback rate to
      maximum value even c->idle_counter reaches an idle threshold.
      
      Now writeback thread won't interfere gc thread performance.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      141df8bb
  2. 27 6月, 2019 1 次提交
  3. 21 6月, 2019 19 次提交