- 11 8月, 2021 11 次提交
-
-
由 Tushar Sugandhi 提交于
For device mapper targets to take advantage of IMA's measurement capabilities, the status functions for the individual targets need to be updated to handle the status_type_t case for value STATUSTYPE_IMA. Update status functions for the following target types, to log their respective attributes to be measured using IMA. 01. cache 02. crypt 03. integrity 04. linear 05. mirror 06. multipath 07. raid 08. snapshot 09. striped 10. verity For rest of the targets, handle the STATUSTYPE_IMA case by setting the measurement buffer to NULL. For IMA to measure the data on a given system, the IMA policy on the system needs to be updated to have the following line, and the system needs to be restarted for the measurements to take effect. /etc/ima/ima-policy measure func=CRITICAL_DATA label=device-mapper template=ima-buf The measurements will be reflected in the IMA logs, which are located at: /sys/kernel/security/integrity/ima/ascii_runtime_measurements /sys/kernel/security/integrity/ima/binary_runtime_measurements These IMA logs can later be consumed by various attestation clients running on the system, and send them to external services for attesting the system. The DM target data measured by IMA subsystem can alternatively be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with DM_TABLE_STATUS_CMD. Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Tushar Sugandhi 提交于
A given block device is identified by it's name and UUID. However, both these parameters can be renamed. For an external attestation service to correctly attest a given device, it needs to keep track of these rename events. Update the device data with the new values for IMA measurements. Measure both old and new device name/UUID parameters in the same IMA measurement event, so that the old and the new values can be connected later. Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Tushar Sugandhi 提交于
For a given block device, an inactive table slot contains the parameters to configure the device with. The inactive table can be cleared multiple times, accidentally or maliciously, which may impact the functionality of the device, and compromise the system. Therefore it is important to measure and log the event when a table is cleared. Measure device parameters, and table hashes when the inactive table slot is cleared. Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Tushar Sugandhi 提交于
Presence of an active block-device, configured with expected parameters, is important for an external attestation service to determine if a system meets the attestation requirements. Therefore it is important for DM to measure the device remove events. Measure device parameters and table hashes when the device is removed, using either remove or remove_all. Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Tushar Sugandhi 提交于
A given block device can load a table multiple times, with different input parameters, before eventually resuming it. Further, a device may be suspended and then resumed. The device may never resume after a table-load. Because of the above valid scenarios for a given device, it is important to measure and log the device resume event using IMA. Also, if the table is large, measuring it in clear-text each time the device changes state, will unnecessarily increase the size of IMA log. Since the table clear-text is already measured during table-load event, measuring the hash during resume should be sufficient to validate the table contents. Measure the device parameters, and hash of the active table, when the device is resumed. Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Tushar Sugandhi 提交于
DM configures a block device with various target specific attributes passed to it as a table. DM loads the table, and calls each target’s respective constructors with the attributes as input parameters. Some of these attributes are critical to ensure the device meets certain security bar. Thus, IMA should measure these attributes, to ensure they are not tampered with, during the lifetime of the device. So that the external services can have high confidence in the configuration of the block-devices on a given system. Some devices may have large tables. And a given device may change its state (table-load, suspend, resume, rename, remove, table-clear etc.) many times. Measuring these attributes each time when the device changes its state will significantly increase the size of the IMA logs. Further, once configured, these attributes are not expected to change unless a new table is loaded, or a device is removed and recreated. Therefore the clear-text of the attributes should only be measured during table load, and the hash of the active/inactive table should be measured for the remaining device state changes. Export IMA function ima_measure_critical_data() to allow measurement of DM device parameters, as well as target specific attributes, during table load. Compute the hash of the inactive table and store it for measurements during future state change. If a load is called multiple times, update the inactive table hash with the hash of the latest populated table. So that the correct inactive table hash is measured when the device transitions to different states like resume, remove, rename, etc. Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Add 10 counters for various events (hit, miss, etc) and export them in the status line (accessed from userspace with "dmsetup status"). Also add a message "clear_stats" that resets these counters. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
If some "writecache_map_*" function returns invalid state, it is a bug. So, we should report it and not fail silently. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Factor out writecache_map_flush() and writecache_map_discard() from writecache_map(). Also eliminate the various goto labels in writecache_map(). Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
writecache_map() has grown too large and can be confusing to read given all the goto statements. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 10 8月, 2021 5 次提交
-
-
由 Christoph Hellwig 提交于
.. and rename the function to disk_update_readahead. This is in preparation for moving the BDI from the request_queue to the gendisk. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
device mapper is currently the only outlier that tries to call register_disk after add_disk, leading to fairly inconsistent state of these block layer data structures. Instead change device-mapper to just register the gendisk later now that the holder mechanism can cope with that. Note that this introduces a user visible change: the dm kobject is now only visible after the initial table has been loaded. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move setting md->type from both callers into dm_setup_md_queue. This ensures that md->type is only set to a valid value after the queue has been fully setup, something we'll rely on future changes. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
md->queue is now always set when md->disk is set, so simplify the conditionals a bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move the block holder code into a separate file as it is not in any way related to the other block_dev.c code, and add a new selectable config option for it so that we don't have to build it without any remapped drivers selected. The Kconfig symbol contains a _DEPRECATED suffix to match the comments added in commit 49731baa ("block: restore multiple bd_link_disk_holder() support"). Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 8月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no need to disable interrupts in bio_copy_block, and the local only mappings helps to avoid any sort of problems with stray writes into the bio data. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210727055646.118787-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 6月, 2021 1 次提交
-
-
由 Mikulas Patocka 提交于
Commit 95b88f4d ("dm writecache: pause writeback if cache full and origin being written directly") introduced a code that pauses cache flushing if we are issuing writes directly to the origin. Improve that initial commit by making the timeout code configurable (via the option "pause_writeback"). Also change the default from 1s to 3s because it performed better. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 26 6月, 2021 6 次提交
-
-
由 Mikulas Patocka 提交于
Implementation reuses dm_io_tracker, that until now was only used by dm-cache, to track if any writes were issued directly to the origin (due to cache being full) within the last second. If so writeback is paused for a second. This change improves performance for when the cache is full and IO is issued directly to the origin device (rather than through the cache). Depends-on: d53f1faf ("dm writecache: do direct write if the cache is full") Suggested-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Allow other code to use dm_io_tracker. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Hou Tao 提交于
remove_raw() in dm_btree_remove() may fail due to IO read error (e.g. read the content of origin block fails during shadowing), and the value of shadow_spine::root is uninitialized, but the uninitialized value is still assign to new_root in the end of dm_btree_remove(). For dm-thin, the value of pmd->details_root or pmd->root will become an uninitialized value, so if trying to read details_info tree again out-of-bound memory may occur as showed below: general protection fault, probably for non-canonical address 0x3fdcb14c8d7520 CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6 Hardware name: QEMU Standard PC RIP: 0010:metadata_ll_load_ie+0x14/0x30 Call Trace: sm_metadata_count_is_more_than_one+0xb9/0xe0 dm_tm_shadow_block+0x52/0x1c0 shadow_step+0x59/0xf0 remove_raw+0xb2/0x170 dm_btree_remove+0xf4/0x1c0 dm_pool_delete_thin_device+0xc3/0x140 pool_message+0x218/0x2b0 target_message+0x251/0x290 ctl_ioctl+0x1c4/0x4d0 dm_ctl_ioctl+0xe/0x20 __x64_sys_ioctl+0x7b/0xb0 do_syscall_64+0x40/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixing it by only assign new_root when removal succeeds Signed-off-by: NHou Tao <houtao1@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Damien Le Moal 提交于
Make sure that the zone write pointer offset array is allocated with a vmalloc in dm_zone_revalidate_cb() by passing GFP_KERNEL gfp flag to kvcalloc(). However, since we do not want to trigger IOs while revalidating zones, change dm_revalidate_zones() to have the zone scan done in GFP_NOIO context using memalloc_noio_save/restore calls. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Fixes: bb37d772 ("dm: introduce zone append emulation") Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Colin Ian King 提交于
The continue statement at the end of a for-loop has no effect, remove it. Addresses-Coverity: ("Continue has no effect") Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Add a "metadata_only" parameter that when present: only metadata is promoted to the cache. This option improves performance for heavier REQ_META workloads (e.g. device-mapper-test-suite's "git clone and checkout" benchmark improves from 341s to 312s). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 22 6月, 2021 1 次提交
-
-
由 Mikulas Patocka 提交于
SSDs perform badly with sub-4k writes (because they perfrorm read-modify-write internally), so make sure writecache writes at least 4k when committing. Fixes: 991bd8d7 ("dm writecache: commit just one block, not a full page") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 18 6月, 2021 1 次提交
-
-
由 Peter Zijlstra 提交于
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NWill Deacon <will@kernel.org> Acked-by: NDaniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
-
- 17 6月, 2021 1 次提交
-
-
由 Mikulas Patocka 提交于
Commit d53f1faf ("dm writecache: do direct write if the cache is full") changed dm-writecache, so that it writes directly to the origin device if the cache is full. Unfortunately, it doesn't forward flush requests to the origin device, so that there is a bug where flushes are being ignored. Fix this by adding missing flush forwarding. For PMEM mode, we fix this bug by disabling direct writes to the origin device, because it performs better. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: d53f1faf ("dm writecache: do direct write if the cache is full") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 16 6月, 2021 1 次提交
-
-
由 Mikulas Patocka 提交于
Make dm-writecache wait if the kcopyd workqueue is busy (as will happen if waiting for page allocation or inside submit_bio). This change improves performance of "mkfs.ext2" by approximately 20% on one testbed. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 15 6月, 2021 12 次提交
-
-
由 Gal Ofri 提交于
There is a lock contention on device_lock in read_one_chunk(). device_lock is taken to sync conf->active_aligned_reads and conf->quiesce. read_one_chunk() takes the lock, then waits for quiesce=0 (resumed) before incrementing active_aligned_reads. raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits for active_aligned_reads to be zero before setting quiesce=1 (suspended). Introduce a fast (lockless) path in read_one_chunk(): activate aligned read without taking device_lock. In case quiesce starts while activating the aligned-read in fast path, deactivate it and revert to old behavior (take device_lock and wait for quiesce to finish). Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to gaurantee that read_one_chunk() does not miss an ongoing quiesce. My setups: 1. 8 local nvme drives (each up to 250k iops). 2. 8 ram disks (brd). Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per socket) system. Record both iops and cpu spent on this contention with rand-read-4k. Record bw with sequential-read-128k. Note: in most cases cpu is still busy but due to "new" bottlenecks. nvme: | iops | cpu | bw ----------------------------------------------- without patch | 1.6M | ~50% | 5.5GB/s with patch | 2M (throttled) | 0% | 16GB/s (throttled) ram (brd): | iops | cpu | bw ----------------------------------------------- without patch | 2M | ~80% | 24GB/s with patch | 4M | 0% | 55GB/s CC: Song Liu <song@kernel.org> CC: Neil Brown <neilb@suse.de> Reviewed-by: NNeilBrown <neilb@suse.de> Signed-off-by: NGal Ofri <gal.ofri@storing.io> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
Given it is not obvious for the error handling, let's try to add some comments here to make it clear. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
The bio_set (io_acct_set) is used by personalities to clone bio and trace the timestamp of bio. Some personalities such as raid1/10 don't need the bio_set, so add check to not create it unconditionally. Also update the comment for md_account_bio to make it more clear. Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Rikard Falkeborn 提交于
The attribute_group structs are never modified, they're only passed to sysfs_create_group() and sysfs_remove_group(). Make them const to allow the compiler to put them in read-only memory. Signed-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
Mark the three personalities (linear, fault and multipath) as deprecated because: 1. people can use dm multipath or nvme multipath. 2. linear is already deprecated in MODULE_ALIAS. 3. no one actively using fault. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
For raid10, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r10bio accordingly. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
For raid1, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r1bio accordingly. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
The caller of raid1_read_request could pass NULL or a valid pointer for "struct r1bio *r1_bio", so it actually means whether r1_bio is existed or not. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
After enable io accounting, chunk read bio could be cloned twice which is not good. To avoid such inefficiency, let's clone align_bio from io_acct_set too, then we need only call md_account_bio in make_request unconditionally. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
We don't need to clone bio if the relevant region has badblock. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
We introduce a new bioset (io_acct_set) for raid0 and raid5 since they don't own clone infrastructure to accounting io. And the bioset is added to mddev instead of to raid0 and raid5 layer, because with this way, we can put common functions to md.h and reuse them in raid0 and raid5. Also struct md_io_acct is added accordingly which includes io start_time, the origin bio and cloned bio. Then we can call bio_{start,end}_io_acct to get related io status. Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Guoqing Jiang 提交于
The commit 41d2d848 ("md: improve io stats accounting") could cause double fault problem per the report [1], and also it is not correct to change ->bi_end_io if md don't own it, so let's revert it. And io stats accounting will be replemented in later commits. [1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t Fixes: 41d2d848 ("md: improve io stats accounting") Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org>
-