1. 06 6月, 2020 5 次提交
  2. 21 5月, 2020 4 次提交
  3. 20 5月, 2020 2 次提交
  4. 15 5月, 2020 6 次提交
  5. 04 3月, 2020 1 次提交
    • M
      dm: bump version of core and various targets · 636be424
      Mike Snitzer 提交于
      Changes made during the 5.6 cycle warrant bumping the version number
      for DM core and the targets modified by this commit.
      
      It should be noted that dm-thin, dm-crypt and dm-raid already had
      their target version bumped during the 5.6 merge window.
      
      Signed-off-by; Mike Snitzer <snitzer@redhat.com>
      636be424
  6. 28 2月, 2020 1 次提交
    • S
      dm zoned: Fix reference counter initial value of chunk works · ee63634b
      Shin'ichiro Kawasaki 提交于
      Dm-zoned initializes reference counters of new chunk works with zero
      value and refcount_inc() is called to increment the counter. However, the
      refcount_inc() function handles the addition to zero value as an error
      and triggers the warning as follows:
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0
      ...
      CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134
      ...
      Call Trace:
       dmz_map+0x2d2/0x350 [dm_zoned]
       __map_bio+0x42/0x1a0
       __split_and_process_non_flush+0x14a/0x1b0
       __split_and_process_bio+0x83/0x240
       ? kmem_cache_alloc+0x165/0x220
       dm_process_bio+0x90/0x230
       ? generic_make_request_checks+0x2e7/0x680
       dm_make_request+0x3e/0xb0
       generic_make_request+0xcf/0x320
       ? memcg_drain_all_list_lrus+0x1c0/0x1c0
       submit_bio+0x3c/0x160
       ? guard_bio_eod+0x2c/0x130
       mpage_readpages+0x182/0x1d0
       ? bdev_evict_inode+0xf0/0xf0
       read_pages+0x6b/0x1b0
       __do_page_cache_readahead+0x1ba/0x1d0
       force_page_cache_readahead+0x93/0x100
       generic_file_read_iter+0x83a/0xe40
       ? __seccomp_filter+0x7b/0x670
       new_sync_read+0x12a/0x1c0
       vfs_read+0x9d/0x150
       ksys_read+0x5f/0xe0
       do_syscall_64+0x5b/0x180
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ...
      
      After this warning, following refcount API calls for the counter all fail
      to change the counter value.
      
      Fix this by setting the initial reference counter value not zero but one
      for the new chunk works. Instead, do not call refcount_inc() via
      dmz_get_chunk_work() for the new chunks works.
      
      The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL
      enabled. Refcount rework was merged to linux version 5.5 by the
      commit 168829ad ("Merge branch 'locking-core-for-linus' of
      git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this
      commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed
      regardless of kernel configuration.
      
      Linux version 4.20 merged the commit 092b5648 ("dm zoned: target: use
      refcount_t for dm zoned reference counters"). Before this commit, dm
      zoned used atomic_t APIs which does not check addition to zero, then this
      fix is not necessary.
      
      Fixes: 092b5648 ("dm zoned: target: use refcount_t for dm zoned reference counters")
      Cc: stable@vger.kernel.org # 5.4+
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ee63634b
  7. 03 12月, 2019 1 次提交
  8. 07 11月, 2019 1 次提交
    • D
      dm zoned: reduce overhead of backing device checks · e7fad909
      Dmitry Fomichev 提交于
      Commit 75d66ffb added backing device health checks and as a part
      of these checks, check_events() block ops template call is invoked in
      dm-zoned mapping path as well as in reclaim and flush path. Calling
      check_events() with ATA or SCSI backing devices introduces a blocking
      scsi_test_unit_ready() call being made in sd_check_events(). Even though
      the overhead of calling scsi_test_unit_ready() is small for ATA zoned
      devices, it is much larger for SCSI and it affects performance in a very
      negative way.
      
      Fix this performance regression by executing check_events() only in case
      of any I/O errors. The function dmz_bdev_is_dying() is modified to call
      only blk_queue_dying(), while calls to check_events() are made in a new
      helper function, dmz_check_bdev().
      Reported-by: Nzhangxiaoxu <zhangxiaoxu5@huawei.com>
      Fixes: 75d66ffb ("dm zoned: properly handle backing device failure")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e7fad909
  9. 26 8月, 2019 1 次提交
    • M
      dm zoned: fix invalid memory access · 0c8e9c2d
      Mikulas Patocka 提交于
      Commit 75d66ffb ("dm zoned: properly
      handle backing device failure") triggers a coverity warning:
      
      *** CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
      /drivers/md/dm-zoned-target.c: 137 in dmz_submit_bio()
      131             clone->bi_private = bioctx;
      132
      133             bio_advance(bio, clone->bi_iter.bi_size);
      134
      135             refcount_inc(&bioctx->ref);
      136             generic_make_request(clone);
      >>>     CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
      >>>     Dereferencing freed pointer "clone".
      137             if (clone->bi_status == BLK_STS_IOERR)
      138                     return -EIO;
      139
      140             if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
      141                     zone->wp_block += nr_blocks;
      142
      
      The "clone" bio may be processed and freed before the check
      "clone->bi_status == BLK_STS_IOERR" - so this check can access invalid
      memory.
      
      Fixes: 75d66ffb ("dm zoned: properly handle backing device failure")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0c8e9c2d
  10. 16 8月, 2019 3 次提交
  11. 19 4月, 2019 1 次提交
  12. 21 2月, 2019 1 次提交
  13. 08 12月, 2018 1 次提交
    • D
      dm zoned: Fix target BIO completion handling · d57f9da8
      Damien Le Moal 提交于
      struct bioctx includes the ref refcount_t to track the number of I/O
      fragments used to process a target BIO as well as ensure that the zone
      of the BIO is kept in the active state throughout the lifetime of the
      BIO. However, since decrementing of this reference count is done in the
      target .end_io method, the function bio_endio() must be called multiple
      times for read and write target BIOs, which causes problems with the
      value of the __bi_remaining struct bio field for chained BIOs (e.g. the
      clone BIO passed by dm core is large and splits into fragments by the
      block layer), resulting in incorrect values and inconsistencies with the
      BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:
      
      BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);
      
      in bio_remaining_done() called from bio_endio().
      
      Fix this ensuring that bio_endio() is called only once for any target
      BIO by always using internal clone BIOs for processing any read or
      write target BIO. This allows reference counting using the target BIO
      context counter to trigger the target BIO completion bio_endio() call
      once all data, metadata and other zone work triggered by the BIO
      complete.
      
      Overall, this simplifies the code too as the target .end_io becomes
      unnecessary and differences between read and write BIO issuing and
      completion processing disappear.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d57f9da8
  14. 26 10月, 2018 1 次提交
  15. 17 10月, 2018 1 次提交
  16. 23 6月, 2018 1 次提交
    • B
      dm zoned: avoid triggering reclaim from inside dmz_map() · 2d0b2d64
      Bart Van Assche 提交于
      This patch avoids that lockdep reports the following:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-rc1 #62 Not tainted
      ------------------------------------------------------
      kswapd0/84 is trying to acquire lock:
      00000000c313516d (&xfs_nondir_ilock_class){++++}, at: xfs_free_eofblocks+0xa2/0x1e0
      
      but task is already holding lock:
      00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (fs_reclaim){+.+.}:
        kmem_cache_alloc+0x2c/0x2b0
        radix_tree_node_alloc.constprop.19+0x3d/0xc0
        __radix_tree_create+0x161/0x1c0
        __radix_tree_insert+0x45/0x210
        dmz_map+0x245/0x2d0 [dm_zoned]
        __map_bio+0x40/0x260
        __split_and_process_non_flush+0x116/0x220
        __split_and_process_bio+0x81/0x180
        __dm_make_request.isra.32+0x5a/0x100
        generic_make_request+0x36e/0x690
        submit_bio+0x6c/0x140
        mpage_readpages+0x19e/0x1f0
        read_pages+0x6d/0x1b0
        __do_page_cache_readahead+0x21b/0x2d0
        force_page_cache_readahead+0xc4/0x100
        generic_file_read_iter+0x7c6/0xd20
        __vfs_read+0x102/0x180
        vfs_read+0x9b/0x140
        ksys_read+0x55/0xc0
        do_syscall_64+0x5a/0x1f0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      -> #1 (&dmz->chunk_lock){+.+.}:
        dmz_map+0x133/0x2d0 [dm_zoned]
        __map_bio+0x40/0x260
        __split_and_process_non_flush+0x116/0x220
        __split_and_process_bio+0x81/0x180
        __dm_make_request.isra.32+0x5a/0x100
        generic_make_request+0x36e/0x690
        submit_bio+0x6c/0x140
        _xfs_buf_ioapply+0x31c/0x590
        xfs_buf_submit_wait+0x73/0x520
        xfs_buf_read_map+0x134/0x2f0
        xfs_trans_read_buf_map+0xc3/0x580
        xfs_read_agf+0xa5/0x1e0
        xfs_alloc_read_agf+0x59/0x2b0
        xfs_alloc_pagf_init+0x27/0x60
        xfs_bmap_longest_free_extent+0x43/0xb0
        xfs_bmap_btalloc_nullfb+0x7f/0xf0
        xfs_bmap_btalloc+0x428/0x7c0
        xfs_bmapi_write+0x598/0xcc0
        xfs_iomap_write_allocate+0x15a/0x330
        xfs_map_blocks+0x1cf/0x3f0
        xfs_do_writepage+0x15f/0x7b0
        write_cache_pages+0x1ca/0x540
        xfs_vm_writepages+0x65/0xa0
        do_writepages+0x48/0xf0
        __writeback_single_inode+0x58/0x730
        writeback_sb_inodes+0x249/0x5c0
        wb_writeback+0x11e/0x550
        wb_workfn+0xa3/0x670
        process_one_work+0x228/0x670
        worker_thread+0x3c/0x390
        kthread+0x11c/0x140
        ret_from_fork+0x3a/0x50
      
      -> #0 (&xfs_nondir_ilock_class){++++}:
        down_read_nested+0x43/0x70
        xfs_free_eofblocks+0xa2/0x1e0
        xfs_fs_destroy_inode+0xac/0x270
        dispose_list+0x51/0x80
        prune_icache_sb+0x52/0x70
        super_cache_scan+0x127/0x1a0
        shrink_slab.part.47+0x1bd/0x590
        shrink_node+0x3b5/0x470
        balance_pgdat+0x158/0x3b0
        kswapd+0x1ba/0x600
        kthread+0x11c/0x140
        ret_from_fork+0x3a/0x50
      
      other info that might help us debug this:
      
      Chain exists of:
        &xfs_nondir_ilock_class --> &dmz->chunk_lock --> fs_reclaim
      
      Possible unsafe locking scenario:
      
           CPU0                    CPU1
           ----                    ----
      lock(fs_reclaim);
                                   lock(&dmz->chunk_lock);
                                   lock(fs_reclaim);
      lock(&xfs_nondir_ilock_class);
      
      *** DEADLOCK ***
      
      3 locks held by kswapd0/84:
       #0: 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
       #1: 000000000f8208f5 (shrinker_rwsem){++++}, at: shrink_slab.part.47+0x3f/0x590
       #2: 00000000cacefa54 (&type->s_umount_key#43){.+.+}, at: trylock_super+0x16/0x50
      
      stack backtrace:
      CPU: 7 PID: 84 Comm: kswapd0 Not tainted 4.18.0-rc1 #62
      Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
      Call Trace:
       dump_stack+0x85/0xcb
       print_circular_bug.isra.36+0x1ce/0x1db
       __lock_acquire+0x124e/0x1310
       lock_acquire+0x9f/0x1f0
       down_read_nested+0x43/0x70
       xfs_free_eofblocks+0xa2/0x1e0
       xfs_fs_destroy_inode+0xac/0x270
       dispose_list+0x51/0x80
       prune_icache_sb+0x52/0x70
       super_cache_scan+0x127/0x1a0
       shrink_slab.part.47+0x1bd/0x590
       shrink_node+0x3b5/0x470
       balance_pgdat+0x158/0x3b0
       kswapd+0x1ba/0x600
       kthread+0x11c/0x140
       ret_from_fork+0x3a/0x50
      Reported-by: NMasato Suzuki <masato.suzuki@wdc.com>
      Fixes: 4218a955 ("dm zoned: use GFP_NOIO in I/O path")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2d0b2d64
  17. 08 6月, 2018 1 次提交
  18. 31 5月, 2018 1 次提交
  19. 05 4月, 2018 1 次提交
  20. 17 1月, 2018 1 次提交
  21. 11 11月, 2017 1 次提交
    • D
      dm zoned: ignore last smaller runt zone · 114e0259
      Damien Le Moal 提交于
      The SCSI layer allows ZBC drives to have a smaller last runt zone. For
      such a device, specifying the entire capacity for a dm-zoned target
      table entry fails because the specified capacity is not aligned on a
      device zone size indicated in the request queue structure of the
      device.
      
      Fix this problem by ignoring the last runt zone in the entry length
      when seting up the dm-zoned target (ctr method) and when iterating table
      entries of the target (iterate_devices method). This allows dm-zoned
      users to still easily setup a target using the entire device capacity
      (as mandated by dm-zoned) or the aligned capacity excluding the last
      runt zone.
      
      While at it, replace direct references to the device queue chunk_sectors
      limit with calls to the accessor blk_queue_zone_sectors().
      Reported-by: NPeter Desnoyers <pjd@ccs.neu.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      114e0259
  22. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  23. 27 7月, 2017 1 次提交
  24. 26 7月, 2017 1 次提交
  25. 19 6月, 2017 1 次提交
    • D
      dm zoned: drive-managed zoned block device target · 3b1a94c8
      Damien Le Moal 提交于
      The dm-zoned device mapper target provides transparent write access
      to zoned block devices (ZBC and ZAC compliant block devices).
      dm-zoned hides to the device user (a file system or an application
      doing raw block device accesses) any constraint imposed on write
      requests by the device, equivalent to a drive-managed zoned block
      device model.
      
      Write requests are processed using a combination of on-disk buffering
      using the device conventional zones and direct in-place processing for
      requests aligned to a zone sequential write pointer position.
      A background reclaim process implemented using dm_kcopyd_copy ensures
      that conventional zones are always available for executing unaligned
      write requests. The reclaim process overhead is minimized by managing
      buffer zones in a least-recently-written order and first targeting the
      oldest buffer zones. Doing so, blocks under regular write access (such
      as metadata blocks of a file system) remain stored in conventional
      zones, resulting in no apparent overhead.
      
      dm-zoned implementation focus on simplicity and on minimizing overhead
      (CPU, memory and storage overhead). For a 14TB host-managed disk with
      256 MB zones, dm-zoned memory usage per disk instance is at most about
      3 MB and as little as 5 zones will be used internally for storing metadata
      and performing buffer zone reclaim operations. This is achieved using
      zone level indirection rather than a full block indirection system for
      managing block movement between zones.
      
      dm-zoned primary target is host-managed zoned block devices but it can
      also be used with host-aware device models to mitigate potential
      device-side performance degradation due to excessive random writing.
      
      Zoned block devices can be formatted and checked for use with the dm-zoned
      target using the dmzadm utility available at:
      
      https://github.com/hgst/dm-zoned-toolsSigned-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      [Mike Snitzer partly refactored Damien's original work to cleanup the code]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3b1a94c8