提交 · 8ec456629d0bf051e41ef2c87a60755f941dd11c · openeuler / Kernel

11 8月, 2021 11 次提交

dm: update target status functions to support IMA measurement · 8ec45662

由 Tushar Sugandhi 提交于 7月 12, 2021

For device mapper targets to take advantage of IMA's measurement
capabilities, the status functions for the individual targets need to be
updated to handle the status_type_t case for value STATUSTYPE_IMA.

Update status functions for the following target types, to log their
respective attributes to be measured using IMA.
 01. cache
 02. crypt
 03. integrity
 04. linear
 05. mirror
 06. multipath
 07. raid
 08. snapshot
 09. striped
 10. verity

For rest of the targets, handle the STATUSTYPE_IMA case by setting the
measurement buffer to NULL.

For IMA to measure the data on a given system, the IMA policy on the
system needs to be updated to have the following line, and the system
needs to be restarted for the measurements to take effect.

/etc/ima/ima-policy
 measure func=CRITICAL_DATA label=device-mapper template=ima-buf

The measurements will be reflected in the IMA logs, which are located at:

/sys/kernel/security/integrity/ima/ascii_runtime_measurements
/sys/kernel/security/integrity/ima/binary_runtime_measurements

These IMA logs can later be consumed by various attestation clients
running on the system, and send them to external services for attesting
the system.

The DM target data measured by IMA subsystem can alternatively
be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
DM_TABLE_STATUS_CMD.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8ec45662

dm ima: measure data on device rename · 7d1d1df8

由 Tushar Sugandhi 提交于 7月 12, 2021

A given block device is identified by it's name and UUID.  However, both
these parameters can be renamed.  For an external attestation service to
correctly attest a given device, it needs to keep track of these rename
events.

Update the device data with the new values for IMA measurements.  Measure
both old and new device name/UUID parameters in the same IMA measurement
event, so that the old and the new values can be connected later.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7d1d1df8

dm ima: measure data on table clear · 99169b93

由 Tushar Sugandhi 提交于 7月 12, 2021

For a given block device, an inactive table slot contains the parameters
to configure the device with.  The inactive table can be cleared
multiple times, accidentally or maliciously, which may impact the
functionality of the device, and compromise the system.  Therefore it is
important to measure and log the event when a table is cleared.

Measure device parameters, and table hashes when the inactive table slot
is cleared.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

99169b93

dm ima: measure data on device remove · 84010e51

由 Tushar Sugandhi 提交于 7月 12, 2021

Presence of an active block-device, configured with expected parameters,
is important for an external attestation service to determine if a system
meets the attestation requirements.  Therefore it is important for DM to
measure the device remove events.

Measure device parameters and table hashes when the device is removed,
using either remove or remove_all.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

84010e51

dm ima: measure data on device resume · 8eb6fab4

由 Tushar Sugandhi 提交于 7月 12, 2021

A given block device can load a table multiple times, with different
input parameters, before eventually resuming it.  Further, a device may
be suspended and then resumed.  The device may never resume after a
table-load.  Because of the above valid scenarios for a given device,
it is important to measure and log the device resume event using IMA.

Also, if the table is large, measuring it in clear-text each time the
device changes state, will unnecessarily increase the size of IMA log.
Since the table clear-text is already measured during table-load event,
measuring the hash during resume should be sufficient to validate the
table contents.

Measure the device parameters, and hash of the active table, when the
device is resumed.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8eb6fab4

dm ima: measure data on table load · 91ccbbac

由 Tushar Sugandhi 提交于 7月 12, 2021

DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.

Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.

Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

91ccbbac

dm writecache: add event counters · e3a35d03

由 Mikulas Patocka 提交于 7月 27, 2021

Add 10 counters for various events (hit, miss, etc) and export them in
the status line (accessed from userspace with "dmsetup status"). Also
add a message "clear_stats" that resets these counters.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e3a35d03

dm writecache: report invalid return from writecache_map helpers · df699cc1

由 Mikulas Patocka 提交于 7月 27, 2021

If some "writecache_map_*" function returns invalid state, it is a bug.
So, we should report it and not fail silently.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

df699cc1

dm writecache: further writecache_map() cleanup · 15cb6f39

由 Mike Snitzer 提交于 7月 12, 2021

Factor out writecache_map_flush() and writecache_map_discard() from
writecache_map(). Also eliminate the various goto labels in
writecache_map().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

15cb6f39

M
dm writecache: factor out writecache_map_remap_origin() · 4d020b3a
由 Mike Snitzer 提交于 7月 12, 2021
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
4d020b3a

dm writecache: split up writecache_map() to improve code readability · cdd4d783

由 Mike Snitzer 提交于 7月 12, 2021

writecache_map() has grown too large and can be confusing to read given
all the goto statements.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cdd4d783

10 8月, 2021 5 次提交

block: pass a gendisk to blk_queue_update_readahead · 471aa704

由 Christoph Hellwig 提交于 8月 09, 2021

.. and rename the function to disk_update_readahead.  This is in
preparation for moving the BDI from the request_queue to the gendisk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

471aa704

dm: delay registering the gendisk · 89f871af

由 Christoph Hellwig 提交于 8月 04, 2021

device mapper is currently the only outlier that tries to call
register_disk after add_disk, leading to fairly inconsistent state
of these block layer data structures.  Instead change device-mapper
to just register the gendisk later now that the holder mechanism
can cope with that.

Note that this introduces a user visible change: the dm kobject is
now only visible after the initial table has been loaded.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

89f871af

dm: move setting md->type into dm_setup_md_queue · ba305859

由 Christoph Hellwig 提交于 8月 04, 2021

Move setting md->type from both callers into dm_setup_md_queue.
This ensures that md->type is only set to a valid value after the queue
has been fully setup, something we'll rely on future changes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ba305859

dm: cleanup cleanup_mapped_device · 74a2b6ec

由 Christoph Hellwig 提交于 8月 04, 2021

md->queue is now always set when md->disk is set, so simplify the
conditionals a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

74a2b6ec

block: make the block holder code optional · c66fd019

由 Christoph Hellwig 提交于 8月 04, 2021

Move the block holder code into a separate file as it is not in any way
related to the other block_dev.c code, and add a new selectable config
option for it so that we don't have to build it without any remapped
drivers selected.

The Kconfig symbol contains a _DEPRECATED suffix to match the comments
added in commit 49731baa
("block: restore multiple bd_link_disk_holder() support").
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c66fd019

03 8月, 2021 1 次提交

dm-writecache: use bvec_kmap_local instead of bvec_kmap_irq · 18a6234c

由 Christoph Hellwig 提交于 7月 27, 2021

There is no need to disable interrupts in bio_copy_block, and the local
only mappings helps to avoid any sort of problems with stray writes
into the bio data.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20210727055646.118787-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

18a6234c

29 6月, 2021 1 次提交

dm writecache: make writeback pause configurable · 5c0de3d7

由 Mikulas Patocka 提交于 6月 28, 2021

Commit 95b88f4d ("dm writecache: pause
writeback if cache full and origin being written directly") introduced a
code that pauses cache flushing if we are issuing writes directly to the
origin.

Improve that initial commit by making the timeout code configurable
(via the option "pause_writeback"). Also change the default from 1s to
3s because it performed better.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5c0de3d7

26 6月, 2021 6 次提交

dm writecache: pause writeback if cache full and origin being written directly · 95b88f4d

由 Mikulas Patocka 提交于 6月 25, 2021

Implementation reuses dm_io_tracker, that until now was only used by
dm-cache, to track if any writes were issued directly to the origin
(due to cache being full) within the last second. If so writeback is
paused for a second.

This change improves performance for when the cache is full and IO is
issued directly to the origin device (rather than through the cache).

Depends-on: d53f1faf ("dm writecache: do direct write if the cache is full")
Suggested-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

95b88f4d

M
dm io tracker: factor out IO tracker · dc4fa29f
由 Mike Snitzer 提交于 6月 25, 2021
```
Allow other code to use dm_io_tracker.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
dc4fa29f

dm btree remove: assign new_root only when removal succeeds · b6e58b54

由 Hou Tao 提交于 6月 17, 2021

remove_raw() in dm_btree_remove() may fail due to IO read error
(e.g. read the content of origin block fails during shadowing),
and the value of shadow_spine::root is uninitialized, but
the uninitialized value is still assign to new_root in the
end of dm_btree_remove().

For dm-thin, the value of pmd->details_root or pmd->root will become
an uninitialized value, so if trying to read details_info tree again
out-of-bound memory may occur as showed below:

  general protection fault, probably for non-canonical address 0x3fdcb14c8d7520
  CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6
  Hardware name: QEMU Standard PC
  RIP: 0010:metadata_ll_load_ie+0x14/0x30
  Call Trace:
   sm_metadata_count_is_more_than_one+0xb9/0xe0
   dm_tm_shadow_block+0x52/0x1c0
   shadow_step+0x59/0xf0
   remove_raw+0xb2/0x170
   dm_btree_remove+0xf4/0x1c0
   dm_pool_delete_thin_device+0xc3/0x140
   pool_message+0x218/0x2b0
   target_message+0x251/0x290
   ctl_ioctl+0x1c4/0x4d0
   dm_ctl_ioctl+0xe/0x20
   __x64_sys_ioctl+0x7b/0xb0
   do_syscall_64+0x40/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixing it by only assign new_root when removal succeeds
Signed-off-by: NHou Tao <houtao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b6e58b54

dm zone: fix dm_revalidate_zones() memory allocation · 28436ba3

由 Damien Le Moal 提交于 6月 19, 2021

Make sure that the zone write pointer offset array is allocated with a
vmalloc in dm_zone_revalidate_cb() by passing GFP_KERNEL gfp flag to
kvcalloc(). However, since we do not want to trigger IOs while
revalidating zones, change dm_revalidate_zones() to have the zone scan
done in GFP_NOIO context using memalloc_noio_save/restore calls.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: bb37d772 ("dm: introduce zone append emulation")
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

28436ba3

dm ps io affinity: remove redundant continue statement · 326dbde2

由 Colin Ian King 提交于 6月 16, 2021

The continue statement at the end of a for-loop has no effect,
remove it.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

326dbde2

dm writecache: add optional "metadata_only" parameter · 611c3e16

由 Mikulas Patocka 提交于 6月 21, 2021

Add a "metadata_only" parameter that when present: only metadata is
promoted to the cache. This option improves performance for heavier
REQ_META workloads (e.g. device-mapper-test-suite's "git clone and
checkout" benchmark improves from 341s to 312s).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

611c3e16

22 6月, 2021 1 次提交

dm writecache: write at least 4k when committing · 867de40c

由 Mikulas Patocka 提交于 6月 21, 2021

SSDs perform badly with sub-4k writes (because they perfrorm
read-modify-write internally), so make sure writecache writes at least
4k when committing.

Fixes: 991bd8d7 ("dm writecache: commit just one block, not a full page")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

867de40c

18 6月, 2021 1 次提交

sched: Change task_struct::state · 2f064a59

由 Peter Zijlstra 提交于 6月 11, 2021

Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org

2f064a59

17 6月, 2021 1 次提交

dm writecache: flush origin device when writing and cache is full · ee55b92a

由 Mikulas Patocka 提交于 6月 15, 2021

Commit d53f1faf ("dm writecache: do
direct write if the cache is full") changed dm-writecache, so that it
writes directly to the origin device if the cache is full.
Unfortunately, it doesn't forward flush requests to the origin device,
so that there is a bug where flushes are being ignored.

Fix this by adding missing flush forwarding.

For PMEM mode, we fix this bug by disabling direct writes to the origin
device, because it performs better.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: d53f1faf ("dm writecache: do direct write if the cache is full")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ee55b92a

16 6月, 2021 1 次提交

dm writecache: have ssd writeback wait if the kcopyd workqueue is busy · 293128b1

由 Mikulas Patocka 提交于 6月 15, 2021

Make dm-writecache wait if the kcopyd workqueue is busy (as will
happen if waiting for page allocation or inside submit_bio).

This change improves performance of "mkfs.ext2" by approximately 20%
on one testbed.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

293128b1

15 6月, 2021 12 次提交

md/raid5: avoid device_lock in read_one_chunk() · 97ae2725

由 Gal Ofri 提交于 6月 07, 2021

There is a lock contention on device_lock in read_one_chunk().
device_lock is taken to sync conf->active_aligned_reads and
conf->quiesce.
read_one_chunk() takes the lock, then waits for quiesce=0 (resumed)
before incrementing active_aligned_reads.
raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits
for active_aligned_reads to be zero before setting quiesce=1
(suspended).

Introduce a fast (lockless) path in read_one_chunk(): activate aligned
read without taking device_lock.  In case quiesce starts while
activating the aligned-read in fast path, deactivate it and revert to
old behavior (take device_lock and wait for quiesce to finish).

Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to
gaurantee that read_one_chunk() does not miss an ongoing quiesce.

My setups:
1. 8 local nvme drives (each up to 250k iops).
2. 8 ram disks (brd).

Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per
socket) system. Record both iops and cpu spent on this contention with
rand-read-4k. Record bw with sequential-read-128k.  Note: in most cases
cpu is still busy but due to "new" bottlenecks.

nvme:
              | iops           | cpu  | bw
-----------------------------------------------
without patch | 1.6M           | ~50% | 5.5GB/s
with patch    | 2M (throttled) | 0%   | 16GB/s (throttled)

ram (brd):
              | iops           | cpu  | bw
-----------------------------------------------
without patch | 2M             | ~80% | 24GB/s
with patch    | 4M             | 0%   | 55GB/s

CC: Song Liu <song@kernel.org>
CC: Neil Brown <neilb@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NGal Ofri <gal.ofri@storing.io>
Signed-off-by: NSong Liu <song@kernel.org>

97ae2725

md: add comments in md_integrity_register · de3ea66e

由 Guoqing Jiang 提交于 6月 03, 2021

Given it is not obvious for the error handling, let's try to add some
comments here to make it clear.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

de3ea66e

md: check level before create and exit io_acct_set · daee2024

由 Guoqing Jiang 提交于 6月 03, 2021

The bio_set (io_acct_set) is used by personalities to clone bio and
trace the timestamp of bio. Some personalities such as raid1/10 don't
need the bio_set, so add check to not create it unconditionally.

Also update the comment for md_account_bio to make it more clear.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

daee2024

md: Constify attribute_group structs · c32dc040

由 Rikard Falkeborn 提交于 5月 29, 2021

The attribute_group structs are never modified, they're only passed to
sysfs_create_group() and sysfs_remove_group(). Make them const to allow
the compiler to put them in read-only memory.
Signed-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: NSong Liu <song@kernel.org>

c32dc040

md: mark some personalities as deprecated · 608f52e3

由 Guoqing Jiang 提交于 5月 25, 2021

Mark the three personalities (linear, fault and multipath) as deprecated
because:

1. people can use dm multipath or nvme multipath.
2. linear is already deprecated in MODULE_ALIAS.
3. no one actively using fault.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

608f52e3

md/raid10: enable io accounting · 528bc2cf

由 Guoqing Jiang 提交于 5月 25, 2021

For raid10, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.

Also introduce start_time in r10bio accordingly.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

528bc2cf

md/raid1: enable io accounting · a0159832

由 Guoqing Jiang 提交于 5月 25, 2021

For raid1, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.

Also introduce start_time in r1bio accordingly.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

a0159832

md/raid1: rename print_msg with r1bio_existed · 9b8ae7b9

由 Guoqing Jiang 提交于 5月 25, 2021

The caller of raid1_read_request could pass NULL or a valid pointer for
"struct r1bio *r1_bio", so it actually means whether r1_bio is existed
or not.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

9b8ae7b9

md/raid5: avoid redundant bio clone in raid5_read_one_chunk · 1147f58e

由 Guoqing Jiang 提交于 5月 25, 2021

After enable io accounting, chunk read bio could be cloned twice which
is not good. To avoid such inefficiency, let's clone align_bio from
io_acct_set too, then we need only call md_account_bio in make_request
unconditionally.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

1147f58e

md/raid5: move checking badblock before clone bio in raid5_read_one_chunk · c82aa1b7

由 Guoqing Jiang 提交于 5月 25, 2021

We don't need to clone bio if the relevant region has badblock.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

c82aa1b7

md: add io accounting for raid0 and raid5 · 10764815

由 Guoqing Jiang 提交于 5月 25, 2021

We introduce a new bioset (io_acct_set) for raid0 and raid5 since they
don't own clone infrastructure to accounting io. And the bioset is added
to mddev instead of to raid0 and raid5 layer, because with this way, we
can put common functions to md.h and reuse them in raid0 and raid5.

Also struct md_io_acct is added accordingly which includes io start_time,
the origin bio and cloned bio. Then we can call bio_{start,end}_io_acct
to get related io status.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

10764815

md: revert io stats accounting · ad3fc798

由 Guoqing Jiang 提交于 5月 25, 2021

The commit 41d2d848 ("md: improve io stats accounting") could cause
double fault problem per the report [1], and also it is not correct to
change ->bi_end_io if md don't own it, so let's revert it.

And io stats accounting will be replemented in later commits.

[1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t

Fixes: 41d2d848 ("md: improve io stats accounting")
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

ad3fc798

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功