1. 11 8月, 2021 1 次提交
    • T
      dm: update target status functions to support IMA measurement · 8ec45662
      Tushar Sugandhi 提交于
      For device mapper targets to take advantage of IMA's measurement
      capabilities, the status functions for the individual targets need to be
      updated to handle the status_type_t case for value STATUSTYPE_IMA.
      
      Update status functions for the following target types, to log their
      respective attributes to be measured using IMA.
       01. cache
       02. crypt
       03. integrity
       04. linear
       05. mirror
       06. multipath
       07. raid
       08. snapshot
       09. striped
       10. verity
      
      For rest of the targets, handle the STATUSTYPE_IMA case by setting the
      measurement buffer to NULL.
      
      For IMA to measure the data on a given system, the IMA policy on the
      system needs to be updated to have the following line, and the system
      needs to be restarted for the measurements to take effect.
      
      /etc/ima/ima-policy
       measure func=CRITICAL_DATA label=device-mapper template=ima-buf
      
      The measurements will be reflected in the IMA logs, which are located at:
      
      /sys/kernel/security/integrity/ima/ascii_runtime_measurements
      /sys/kernel/security/integrity/ima/binary_runtime_measurements
      
      These IMA logs can later be consumed by various attestation clients
      running on the system, and send them to external services for attesting
      the system.
      
      The DM target data measured by IMA subsystem can alternatively
      be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
      DM_TABLE_STATUS_CMD.
      Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8ec45662
  2. 23 3月, 2021 1 次提交
    • S
      dm table: Fix zoned model check and zone sectors check · 2d669ceb
      Shin'ichiro Kawasaki 提交于
      Commit 24f6b603 ("dm table: fix zoned iterate_devices based device
      capability checks") triggered dm table load failure when dm-zoned device
      is set up for zoned block devices and a regular device for cache.
      
      The commit inverted logic of two callback functions for iterate_devices:
      device_is_zoned_model() and device_matches_zone_sectors(). The logic of
      device_is_zoned_model() was inverted then all destination devices of all
      targets in dm table are required to have the expected zoned model. This
      is fine for dm-linear, dm-flakey and dm-crypt on zoned block devices
      since each target has only one destination device. However, this results
      in failure for dm-zoned with regular cache device since that target has
      both regular block device and zoned block devices.
      
      As for device_matches_zone_sectors(), the commit inverted the logic to
      require all zoned block devices in each target have the specified
      zone_sectors. This check also fails for regular block device which does
      not have zones.
      
      To avoid the check failures, fix the zone model check and the zone
      sectors check. For zone model check, introduce the new feature flag
      DM_TARGET_MIXED_ZONED_MODEL, and set it to dm-zoned target. When the
      target has this flag, allow it to have destination devices with any
      zoned model. For zone sectors check, skip the check if the destination
      device is not a zoned block device. Also add comments and improve an
      error message to clarify expectations to the two checks.
      
      Fixes: 24f6b603 ("dm table: fix zoned iterate_devices based device capability checks")
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2d669ceb
  3. 09 7月, 2020 1 次提交
    • D
      dm zoned: Fix zone reclaim trigger · 174364f6
      Damien Le Moal 提交于
      Only triggering reclaim based on the percentage of unmapped cache
      zones can fail to detect cases where reclaim is needed, e.g. if the
      target has only 2 or 3 cache zones and only one unmapped cache zone,
      the percentage of free cache zones is higher than
      DMZ_RECLAIM_LOW_UNMAP_ZONES (30%) and reclaim does not trigger.
      
      This problem, combined with the fact that dmz_schedule_reclaim() is
      called from dmz_handle_bio() without the map lock held, leads to a
      race between zone allocation and dmz_should_reclaim() result.
      Depending on the workload applied, this race can lead to the write
      path waiting forever for a free zone without reclaim being triggered.
      
      Fix this by moving dmz_schedule_reclaim() inside dmz_alloc_zone()
      under the map lock. This results in checking the need for zone reclaim
      whenever a new data or buffer zone needs to be allocated.
      
      Also fix dmz_reclaim_percentage() to always return 0 if the number of
      unmapped cache (or random) zones is less than or equal to 1.
      Suggested-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      174364f6
  4. 01 7月, 2020 1 次提交
  5. 18 6月, 2020 1 次提交
  6. 06 6月, 2020 5 次提交
  7. 21 5月, 2020 4 次提交
  8. 20 5月, 2020 2 次提交
  9. 15 5月, 2020 6 次提交
  10. 04 3月, 2020 1 次提交
    • M
      dm: bump version of core and various targets · 636be424
      Mike Snitzer 提交于
      Changes made during the 5.6 cycle warrant bumping the version number
      for DM core and the targets modified by this commit.
      
      It should be noted that dm-thin, dm-crypt and dm-raid already had
      their target version bumped during the 5.6 merge window.
      
      Signed-off-by; Mike Snitzer <snitzer@redhat.com>
      636be424
  11. 28 2月, 2020 1 次提交
    • S
      dm zoned: Fix reference counter initial value of chunk works · ee63634b
      Shin'ichiro Kawasaki 提交于
      Dm-zoned initializes reference counters of new chunk works with zero
      value and refcount_inc() is called to increment the counter. However, the
      refcount_inc() function handles the addition to zero value as an error
      and triggers the warning as follows:
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0
      ...
      CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134
      ...
      Call Trace:
       dmz_map+0x2d2/0x350 [dm_zoned]
       __map_bio+0x42/0x1a0
       __split_and_process_non_flush+0x14a/0x1b0
       __split_and_process_bio+0x83/0x240
       ? kmem_cache_alloc+0x165/0x220
       dm_process_bio+0x90/0x230
       ? generic_make_request_checks+0x2e7/0x680
       dm_make_request+0x3e/0xb0
       generic_make_request+0xcf/0x320
       ? memcg_drain_all_list_lrus+0x1c0/0x1c0
       submit_bio+0x3c/0x160
       ? guard_bio_eod+0x2c/0x130
       mpage_readpages+0x182/0x1d0
       ? bdev_evict_inode+0xf0/0xf0
       read_pages+0x6b/0x1b0
       __do_page_cache_readahead+0x1ba/0x1d0
       force_page_cache_readahead+0x93/0x100
       generic_file_read_iter+0x83a/0xe40
       ? __seccomp_filter+0x7b/0x670
       new_sync_read+0x12a/0x1c0
       vfs_read+0x9d/0x150
       ksys_read+0x5f/0xe0
       do_syscall_64+0x5b/0x180
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ...
      
      After this warning, following refcount API calls for the counter all fail
      to change the counter value.
      
      Fix this by setting the initial reference counter value not zero but one
      for the new chunk works. Instead, do not call refcount_inc() via
      dmz_get_chunk_work() for the new chunks works.
      
      The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL
      enabled. Refcount rework was merged to linux version 5.5 by the
      commit 168829ad ("Merge branch 'locking-core-for-linus' of
      git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this
      commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed
      regardless of kernel configuration.
      
      Linux version 4.20 merged the commit 092b5648 ("dm zoned: target: use
      refcount_t for dm zoned reference counters"). Before this commit, dm
      zoned used atomic_t APIs which does not check addition to zero, then this
      fix is not necessary.
      
      Fixes: 092b5648 ("dm zoned: target: use refcount_t for dm zoned reference counters")
      Cc: stable@vger.kernel.org # 5.4+
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ee63634b
  12. 03 12月, 2019 1 次提交
  13. 07 11月, 2019 1 次提交
    • D
      dm zoned: reduce overhead of backing device checks · e7fad909
      Dmitry Fomichev 提交于
      Commit 75d66ffb added backing device health checks and as a part
      of these checks, check_events() block ops template call is invoked in
      dm-zoned mapping path as well as in reclaim and flush path. Calling
      check_events() with ATA or SCSI backing devices introduces a blocking
      scsi_test_unit_ready() call being made in sd_check_events(). Even though
      the overhead of calling scsi_test_unit_ready() is small for ATA zoned
      devices, it is much larger for SCSI and it affects performance in a very
      negative way.
      
      Fix this performance regression by executing check_events() only in case
      of any I/O errors. The function dmz_bdev_is_dying() is modified to call
      only blk_queue_dying(), while calls to check_events() are made in a new
      helper function, dmz_check_bdev().
      Reported-by: Nzhangxiaoxu <zhangxiaoxu5@huawei.com>
      Fixes: 75d66ffb ("dm zoned: properly handle backing device failure")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e7fad909
  14. 26 8月, 2019 1 次提交
    • M
      dm zoned: fix invalid memory access · 0c8e9c2d
      Mikulas Patocka 提交于
      Commit 75d66ffb ("dm zoned: properly
      handle backing device failure") triggers a coverity warning:
      
      *** CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
      /drivers/md/dm-zoned-target.c: 137 in dmz_submit_bio()
      131             clone->bi_private = bioctx;
      132
      133             bio_advance(bio, clone->bi_iter.bi_size);
      134
      135             refcount_inc(&bioctx->ref);
      136             generic_make_request(clone);
      >>>     CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
      >>>     Dereferencing freed pointer "clone".
      137             if (clone->bi_status == BLK_STS_IOERR)
      138                     return -EIO;
      139
      140             if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
      141                     zone->wp_block += nr_blocks;
      142
      
      The "clone" bio may be processed and freed before the check
      "clone->bi_status == BLK_STS_IOERR" - so this check can access invalid
      memory.
      
      Fixes: 75d66ffb ("dm zoned: properly handle backing device failure")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0c8e9c2d
  15. 16 8月, 2019 3 次提交
  16. 19 4月, 2019 1 次提交
  17. 21 2月, 2019 1 次提交
  18. 08 12月, 2018 1 次提交
    • D
      dm zoned: Fix target BIO completion handling · d57f9da8
      Damien Le Moal 提交于
      struct bioctx includes the ref refcount_t to track the number of I/O
      fragments used to process a target BIO as well as ensure that the zone
      of the BIO is kept in the active state throughout the lifetime of the
      BIO. However, since decrementing of this reference count is done in the
      target .end_io method, the function bio_endio() must be called multiple
      times for read and write target BIOs, which causes problems with the
      value of the __bi_remaining struct bio field for chained BIOs (e.g. the
      clone BIO passed by dm core is large and splits into fragments by the
      block layer), resulting in incorrect values and inconsistencies with the
      BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:
      
      BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);
      
      in bio_remaining_done() called from bio_endio().
      
      Fix this ensuring that bio_endio() is called only once for any target
      BIO by always using internal clone BIOs for processing any read or
      write target BIO. This allows reference counting using the target BIO
      context counter to trigger the target BIO completion bio_endio() call
      once all data, metadata and other zone work triggered by the BIO
      complete.
      
      Overall, this simplifies the code too as the target .end_io becomes
      unnecessary and differences between read and write BIO issuing and
      completion processing disappear.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d57f9da8
  19. 26 10月, 2018 1 次提交
  20. 17 10月, 2018 1 次提交
  21. 23 6月, 2018 1 次提交
    • B
      dm zoned: avoid triggering reclaim from inside dmz_map() · 2d0b2d64
      Bart Van Assche 提交于
      This patch avoids that lockdep reports the following:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-rc1 #62 Not tainted
      ------------------------------------------------------
      kswapd0/84 is trying to acquire lock:
      00000000c313516d (&xfs_nondir_ilock_class){++++}, at: xfs_free_eofblocks+0xa2/0x1e0
      
      but task is already holding lock:
      00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (fs_reclaim){+.+.}:
        kmem_cache_alloc+0x2c/0x2b0
        radix_tree_node_alloc.constprop.19+0x3d/0xc0
        __radix_tree_create+0x161/0x1c0
        __radix_tree_insert+0x45/0x210
        dmz_map+0x245/0x2d0 [dm_zoned]
        __map_bio+0x40/0x260
        __split_and_process_non_flush+0x116/0x220
        __split_and_process_bio+0x81/0x180
        __dm_make_request.isra.32+0x5a/0x100
        generic_make_request+0x36e/0x690
        submit_bio+0x6c/0x140
        mpage_readpages+0x19e/0x1f0
        read_pages+0x6d/0x1b0
        __do_page_cache_readahead+0x21b/0x2d0
        force_page_cache_readahead+0xc4/0x100
        generic_file_read_iter+0x7c6/0xd20
        __vfs_read+0x102/0x180
        vfs_read+0x9b/0x140
        ksys_read+0x55/0xc0
        do_syscall_64+0x5a/0x1f0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      -> #1 (&dmz->chunk_lock){+.+.}:
        dmz_map+0x133/0x2d0 [dm_zoned]
        __map_bio+0x40/0x260
        __split_and_process_non_flush+0x116/0x220
        __split_and_process_bio+0x81/0x180
        __dm_make_request.isra.32+0x5a/0x100
        generic_make_request+0x36e/0x690
        submit_bio+0x6c/0x140
        _xfs_buf_ioapply+0x31c/0x590
        xfs_buf_submit_wait+0x73/0x520
        xfs_buf_read_map+0x134/0x2f0
        xfs_trans_read_buf_map+0xc3/0x580
        xfs_read_agf+0xa5/0x1e0
        xfs_alloc_read_agf+0x59/0x2b0
        xfs_alloc_pagf_init+0x27/0x60
        xfs_bmap_longest_free_extent+0x43/0xb0
        xfs_bmap_btalloc_nullfb+0x7f/0xf0
        xfs_bmap_btalloc+0x428/0x7c0
        xfs_bmapi_write+0x598/0xcc0
        xfs_iomap_write_allocate+0x15a/0x330
        xfs_map_blocks+0x1cf/0x3f0
        xfs_do_writepage+0x15f/0x7b0
        write_cache_pages+0x1ca/0x540
        xfs_vm_writepages+0x65/0xa0
        do_writepages+0x48/0xf0
        __writeback_single_inode+0x58/0x730
        writeback_sb_inodes+0x249/0x5c0
        wb_writeback+0x11e/0x550
        wb_workfn+0xa3/0x670
        process_one_work+0x228/0x670
        worker_thread+0x3c/0x390
        kthread+0x11c/0x140
        ret_from_fork+0x3a/0x50
      
      -> #0 (&xfs_nondir_ilock_class){++++}:
        down_read_nested+0x43/0x70
        xfs_free_eofblocks+0xa2/0x1e0
        xfs_fs_destroy_inode+0xac/0x270
        dispose_list+0x51/0x80
        prune_icache_sb+0x52/0x70
        super_cache_scan+0x127/0x1a0
        shrink_slab.part.47+0x1bd/0x590
        shrink_node+0x3b5/0x470
        balance_pgdat+0x158/0x3b0
        kswapd+0x1ba/0x600
        kthread+0x11c/0x140
        ret_from_fork+0x3a/0x50
      
      other info that might help us debug this:
      
      Chain exists of:
        &xfs_nondir_ilock_class --> &dmz->chunk_lock --> fs_reclaim
      
      Possible unsafe locking scenario:
      
           CPU0                    CPU1
           ----                    ----
      lock(fs_reclaim);
                                   lock(&dmz->chunk_lock);
                                   lock(fs_reclaim);
      lock(&xfs_nondir_ilock_class);
      
      *** DEADLOCK ***
      
      3 locks held by kswapd0/84:
       #0: 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
       #1: 000000000f8208f5 (shrinker_rwsem){++++}, at: shrink_slab.part.47+0x3f/0x590
       #2: 00000000cacefa54 (&type->s_umount_key#43){.+.+}, at: trylock_super+0x16/0x50
      
      stack backtrace:
      CPU: 7 PID: 84 Comm: kswapd0 Not tainted 4.18.0-rc1 #62
      Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
      Call Trace:
       dump_stack+0x85/0xcb
       print_circular_bug.isra.36+0x1ce/0x1db
       __lock_acquire+0x124e/0x1310
       lock_acquire+0x9f/0x1f0
       down_read_nested+0x43/0x70
       xfs_free_eofblocks+0xa2/0x1e0
       xfs_fs_destroy_inode+0xac/0x270
       dispose_list+0x51/0x80
       prune_icache_sb+0x52/0x70
       super_cache_scan+0x127/0x1a0
       shrink_slab.part.47+0x1bd/0x590
       shrink_node+0x3b5/0x470
       balance_pgdat+0x158/0x3b0
       kswapd+0x1ba/0x600
       kthread+0x11c/0x140
       ret_from_fork+0x3a/0x50
      Reported-by: NMasato Suzuki <masato.suzuki@wdc.com>
      Fixes: 4218a955 ("dm zoned: use GFP_NOIO in I/O path")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2d0b2d64
  22. 08 6月, 2018 1 次提交
  23. 31 5月, 2018 1 次提交
  24. 05 4月, 2018 1 次提交
  25. 17 1月, 2018 1 次提交