1. 28 6月, 2019 10 次提交
    • C
      bcache: add return value check to bch_cached_dev_run() · 0b13efec
      Coly Li 提交于
      This patch adds return value check to bch_cached_dev_run(), now if there
      is error happens inside bch_cached_dev_run(), it can be catched.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b13efec
    • A
      bcache: use sysfs_match_string() instead of __sysfs_match_string() · 89e0341a
      Alexandru Ardelean 提交于
      The arrays (of strings) that are passed to __sysfs_match_string() are
      static, so use sysfs_match_string() which does an implicit ARRAY_SIZE()
      over these arrays.
      
      Functionally, this doesn't change anything.
      The change is more cosmetic.
      
      It only shrinks the static arrays by 1 byte each.
      Signed-off-by: NAlexandru Ardelean <alexandru.ardelean@analog.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      89e0341a
    • C
      bcache: remove unnecessary prefetch() in bset_search_tree() · f960facb
      Coly Li 提交于
      In function bset_search_tree(), when p >= t->size, t->tree[0] will be
      prefetched by the following code piece,
       974                 unsigned int p = n << 4;
       975
       976                 p &= ((int) (p - t->size)) >> 31;
       977
       978                 prefetch(&t->tree[p]);
      
      The purpose of the above code is to avoid a branch instruction, but
      when p >= t->size, prefetch(&t->tree[0]) has no positive performance
      contribution at all. This patch avoids the unncessary prefetch by only
      calling prefetch() when p < t->size.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f960facb
    • C
      bcache: add io error counting in write_bdev_super_endio() · 08ec1e62
      Coly Li 提交于
      When backing device super block is written by bch_write_bdev_super(),
      the bio complete callback write_bdev_super_endio() simply ignores I/O
      status. Indeed such write request also contribute to backing device
      health status if the request failed.
      
      This patch checkes bio->bi_status in write_bdev_super_endio(), if there
      is error, bch_count_backing_io_errors() will be called to count an I/O
      error to dc->io_errors.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08ec1e62
    • C
      bcache: ignore read-ahead request failure on backing device · 578df99b
      Coly Li 提交于
      When md raid device (e.g. raid456) is used as backing device, read-ahead
      requests on a degrading and recovering md raid device might be failured
      immediately by md raid code, but indeed this md raid array can still be
      read or write for normal I/O requests. Therefore such failed read-ahead
      request are not real hardware failure. Further more, after degrading and
      recovering accomplished, read-ahead requests will be handled by md raid
      array again.
      
      For such condition, I/O failures of read-ahead requests don't indicate
      real health status (because normal I/O still be served), they should not
      be counted into I/O error counter dc->io_errors.
      
      Since there is no simple way to detect whether the backing divice is a
      md raid device, this patch simply ignores I/O failures for read-ahead
      bios on backing device, to avoid bogus backing device failure on a
      degrading md raid array.
      Suggested-and-tested-by: NThorsten Knabe <linux@thorsten-knabe.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      578df99b
    • C
      bcache: avoid flushing btree node in cache_set_flush() if io disabled · e6dcbd3e
      Coly Li 提交于
      When cache_set_flush() is called for too many I/O errors detected on
      cache device and the cache set is retiring, inside the function it
      doesn't make sense to flushing cached btree nodes from c->btree_cache
      because CACHE_SET_IO_DISABLE is set on c->flags already and all I/Os
      onto cache device will be rejected.
      
      This patch checks in cache_set_flush() that whether CACHE_SET_IO_DISABLE
      is set. If yes, then avoids to flush the cached btree nodes to reduce
      more time and make cache set retiring more faster.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6dcbd3e
    • C
      Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()" · 695277f1
      Coly Li 提交于
      This reverts commit 6147305c.
      
      Although this patch helps the failed bcache device to stop faster when
      too many I/O errors detected on corresponding cached device, setting
      CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This
      operation will disable all I/Os on cache set, which means other attached
      bcache devices won't work neither.
      
      Without this patch, the failed bcache device can also be stopped
      eventually if internal I/O accomplished (e.g. writeback). Therefore here
      I revert it.
      
      Fixes: 6147305c ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()")
      Reported-by: NYong Li <mr.liyong@qq.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      695277f1
    • C
      bcache: fix return value error in bch_journal_read() · 0ae49cb7
      Coly Li 提交于
      When everything is OK in bch_journal_read(), finally the return value
      is returned by,
      	return ret;
      which assumes ret will be 0 here. This assumption is wrong when all
      journal buckets as are full and filled with valid journal entries. In
      such cache the last location referencess read_bucket() sets 'ret' to
      1, which means new jset added into jset list. The jset list is list
      'journal' in caller run_cache_set().
      
      Return 1 to run_cache_set() means something wrong and the cache set
      won't start, but indeed everything is OK.
      
      This patch changes the line at end of bch_journal_read() to directly
      return 0 since everything if verything is good. Then a bogus error
      is fixed.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0ae49cb7
    • C
      bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() · b387e9b5
      Coly Li 提交于
      When system memory is in heavy pressure, bch_gc_thread_start() from
      run_cache_set() may fail due to out of memory. In such condition,
      c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following
      failure code path bch_cache_set_error(), when cache_set_flush() gets
      called, the code piece to stop c->gc_thread is broken,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      
      And KASAN catches such NULL pointer deference problem, with the warning
      information:
      
      [  561.207881] ==================================================================
      [  561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440
      [  561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313
      
      [  561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G        W         5.0.0-vanilla+ #3
      [  561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.207935] Workqueue: events cache_set_flush [bcache]
      [  561.207940] Call Trace:
      [  561.207948]  dump_stack+0x9a/0xeb
      [  561.207955]  ? kthread_stop+0x3b/0x440
      [  561.207960]  ? kthread_stop+0x3b/0x440
      [  561.207965]  kasan_report+0x176/0x192
      [  561.207973]  ? kthread_stop+0x3b/0x440
      [  561.207981]  kthread_stop+0x3b/0x440
      [  561.207995]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  561.208008]  process_one_work+0x856/0x1620
      [  561.208015]  ? find_held_lock+0x39/0x1d0
      [  561.208028]  ? drain_workqueue+0x380/0x380
      [  561.208048]  worker_thread+0x87/0xb80
      [  561.208058]  ? __kthread_parkme+0xb6/0x180
      [  561.208067]  ? process_one_work+0x1620/0x1620
      [  561.208072]  kthread+0x326/0x3e0
      [  561.208079]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  561.208090]  ret_from_fork+0x3a/0x50
      [  561.208110] ==================================================================
      [  561.208113] Disabling lock debugging due to kernel taint
      [  561.208115] irq event stamp: 11800231
      [  561.208126] hardirqs last  enabled at (11800231): [<ffffffff83008538>] do_syscall_64+0x18/0x410
      [  561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      [  561.208129] #PF error: [WRITE]
      [  561.312253] hardirqs last disabled at (11800230): [<ffffffff830052ff>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  561.312259] softirqs last  enabled at (11799832): [<ffffffff850005c7>] __do_softirq+0x5c7/0x8c3
      [  561.405975] PGD 0 P4D 0
      [  561.442494] softirqs last disabled at (11799821): [<ffffffff831add2c>] irq_exit+0x1ac/0x1e0
      [  561.791359] Oops: 0002 [#1] SMP KASAN NOPTI
      [  561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G    B   W         5.0.0-vanilla+ #3
      [  561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.791371] Workqueue: events cache_set_flush [bcache]
      [  561.791374] RIP: 0010:kthread_stop+0x3b/0x440
      [  561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 <f0> ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
      [  561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286
      [  561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314
      [  563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297
      [  563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d
      [  563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c
      [  563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68
      [  563.408620] FS:  0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000
      [  563.408622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0
      [  563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.915796] PKRU: 55555554
      [  563.915797] Call Trace:
      [  563.915807]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  563.915812]  process_one_work+0x856/0x1620
      [  564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.033563]  ? find_held_lock+0x39/0x1d0
      [  564.033567]  ? drain_workqueue+0x380/0x380
      [  564.033574]  worker_thread+0x87/0xb80
      [  564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.118042]  ? __kthread_parkme+0xb6/0x180
      [  564.118046]  ? process_one_work+0x1620/0x1620
      [  564.118048]  kthread+0x326/0x3e0
      [  564.118050]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.252441]  ret_from_fork+0x3a/0x50
      [  564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
      [  564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.348360] CR2: 000000000000001c
      [  564.348362] ---[ end trace b7f0e5cc7b2103b0 ]---
      
      Therefore, it is not enough to only check whether c->gc_thread is NULL,
      we should use IS_ERR_OR_NULL() to check both NULL pointer and error
      value.
      
      This patch changes the above buggy code piece in this way,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b387e9b5
    • C
      bcache: don't set max writeback rate if gc is running · 141df8bb
      Coly Li 提交于
      When gc is running, user space I/O processes may wait inside
      bcache code, so no new I/O coming. Indeed this is not a real idle
      time, maximum writeback rate should not be set in such situation.
      Otherwise a faster writeback thread may compete locks with gc thread
      and makes garbage collection slower, which results a longer I/O
      freeze period.
      
      This patch checks c->gc_mark_valid in set_at_max_writeback_rate(). If
      c->gc_mark_valid is 0 (gc running), set_at_max_writeback_rate() returns
      false, then update_writeback_rate() will not set writeback rate to
      maximum value even c->idle_counter reaches an idle threshold.
      
      Now writeback thread won't interfere gc thread performance.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      141df8bb
  2. 27 6月, 2019 1 次提交
  3. 21 6月, 2019 6 次提交
  4. 15 6月, 2019 8 次提交
  5. 13 6月, 2019 2 次提交
    • C
      bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached · 1f0ffa67
      Coly Li 提交于
      When people set a writeback percent via sysfs file,
        /sys/block/bcache<N>/bcache/writeback_percent
      current code directly sets BCACHE_DEV_WB_RUNNING to dc->disk.flags
      and schedules kworker dc->writeback_rate_update.
      
      If there is no cache set attached to, the writeback kernel thread is
      not running indeed, running dc->writeback_rate_update does not make
      sense and may cause NULL pointer deference when reference cache set
      pointer inside update_writeback_rate().
      
      This patch checks whether the cache set point (dc->disk.c) is NULL in
      sysfs interface handler, and only set BCACHE_DEV_WB_RUNNING and
      schedule dc->writeback_rate_update when dc->disk.c is not NULL (it
      means the cache device is attached to a cache set).
      
      This problem might be introduced from initial bcache commit, but
      commit 3fd47bfe ("bcache: stop dc->writeback_rate_update properly")
      changes part of the original code piece, so I add 'Fixes: 3fd47bfe'
      to indicate from which commit this patch can be applied.
      
      Fixes: 3fd47bfe ("bcache: stop dc->writeback_rate_update properly")
      Reported-by: NBjørn Forsman <bjorn.forsman@gmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NBjørn Forsman <bjorn.forsman@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1f0ffa67
    • C
      bcache: fix stack corruption by PRECEDING_KEY() · 31b90956
      Coly Li 提交于
      Recently people report bcache code compiled with gcc9 is broken, one of
      the buggy behavior I observe is that two adjacent 4KB I/Os should merge
      into one but they don't. Finally it turns out to be a stack corruption
      caused by macro PRECEDING_KEY().
      
      See how PRECEDING_KEY() is defined in bset.h,
      437 #define PRECEDING_KEY(_k)                                       \
      438 ({                                                              \
      439         struct bkey *_ret = NULL;                               \
      440                                                                 \
      441         if (KEY_INODE(_k) || KEY_OFFSET(_k)) {                  \
      442                 _ret = &KEY(KEY_INODE(_k), KEY_OFFSET(_k), 0);  \
      443                                                                 \
      444                 if (!_ret->low)                                 \
      445                         _ret->high--;                           \
      446                 _ret->low--;                                    \
      447         }                                                       \
      448                                                                 \
      449         _ret;                                                   \
      450 })
      
      At line 442, _ret points to address of a on-stack variable combined by
      KEY(), the life range of this on-stack variable is in line 442-446,
      once _ret is returned to bch_btree_insert_key(), the returned address
      points to an invalid stack address and this address is overwritten in
      the following called bch_btree_iter_init(). Then argument 'search' of
      bch_btree_iter_init() points to some address inside stackframe of
      bch_btree_iter_init(), exact address depends on how the compiler
      allocates stack space. Now the stack is corrupted.
      
      Fixes: 0eacac22 ("bcache: PRECEDING_KEY()")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NRolf Fokkens <rolf@rolffokkens.nl>
      Reviewed-by: NPierre JUHEN <pierre.juhen@orange.fr>
      Tested-by: NShenghui Wang <shhuiw@foxmail.com>
      Tested-by: NPierre JUHEN <pierre.juhen@orange.fr>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Nix <nix@esperi.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      31b90956
  6. 05 6月, 2019 2 次提交
  7. 31 5月, 2019 2 次提交
  8. 24 5月, 2019 2 次提交
  9. 22 5月, 2019 1 次提交
    • M
      dm: make sure to obey max_io_len_target_boundary · 51b86f9a
      Michael Lass 提交于
      Commit 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM
      target interface") incorrectly removed code from
      __send_changing_extent_only() that is required to impose a per-target IO
      boundary on IO that exceeds max_io_len_target_boundary().  Otherwise
      "special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
      where allowed.
      
      Fix this by restoring the max_io_len_target_boundary() limit in
      __send_changing_extent_only()
      
      Fixes: 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
      Cc: stable@vger.kernel.org # 5.1+
      Signed-off-by: NMichael Lass <bevan@bi-co.net>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      51b86f9a
  10. 21 5月, 2019 4 次提交
  11. 16 5月, 2019 2 次提交