提交 · aaf2e048af2704da5869f27b508b288f36d5c7b7 · openeuler / Kernel

17 6月, 2021 26 次提交

nvmet: add ZBD over ZNS backend support · aaf2e048

由 Chaitanya Kulkarni 提交于 6月 09, 2021

NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for NVMe
protocol-based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are generic zoned block devices like  Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend support for non-ZNS zoned block devices as
NVMeOF targets.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as NVMe
Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With the new command set identifier, we also update the target command
effects logs to reflect the ZNS compliant commands.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

aaf2e048

nvmet: add Command Set Identifier support · ab5d0b38

由 Chaitanya Kulkarni 提交于 6月 09, 2021

NVMe TP 4056 allows controllers to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is a value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controllers
need to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
support the default command set identifier. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
that represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend for NVMeOF
with host side NVMe ZNS interface, since ZNS commands belong to
the different command set than the default one.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ab5d0b38

nvmet: add nvmet_req_bio put helper for backends · 9a01b58c

由 Chaitanya Kulkarni 提交于 6月 09, 2021

In current code there exists two backends which are using inline bio
optimization, that adds a duplicate code for freeing the bio.

For Zoned Block Device backend we also use the same optimzation and it
will lead to having duplicate code in the three backends: generic
bdev, passsthru, and generic zns.

Add a helper function to avoid duplicate code and update the respective
backends.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9a01b58c

nvmet: add req cns error complete helper · 6e597263

由 Chaitanya Kulkarni 提交于 6月 09, 2021

We report error and complete the request when identify cns value is not
handled in nvmet_execute_identify(). This error reporting is also needed
for Zone Block Device backend for NVMeOF target.

Add a helper nvmet_req_cns_error_compplete() to report an error and
complete the request when idenitfy command cns not handled value.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6e597263

block: export blk_next_bio() · c28a6147

由 Chaitanya Kulkarni 提交于 6月 09, 2021

The block layer provides emulation of zone management operations
targeting all zones of a zoned block device only for the zone reset
operation (REQ_OP_ZONE_RESET). In order to correctly implement
exporting of zoned block devices with NVMeOF, emulating zone management
operations targeting all zones of a device is also necessary for the
open, close and finish zone operations (REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH).

Instead of duplicating the code, export the existing helper from block
layer so we can use a bio chaining pattern that is present in the block
layer for REQ_OP_ZONE RESET all emulation in the NVMeOF zoned block
device backend.
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c28a6147

nvmet: remove local variable · 7860569a

由 Chaitanya Kulkarni 提交于 6月 13, 2021

In function errno_to_nvme_status() we store the value of the NVMe
status into the local variable and don't do anything useful with that
but just return.

Remove the local variable and return the value directly from switch.
This also removed extra break statements.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7860569a

nvmet: use nvme status value directly · 8bb6cb9b

由 Chaitanya Kulkarni 提交于 6月 13, 2021

There is no point in keeping the status variable that is used only once
in the function nvmet_async_events_failall().

Remove the variable and use the value directly.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8bb6cb9b

nvmet: use u32 type for the local variable nsid · 245067e3

由 Chaitanya Kulkarni 提交于 6月 13, 2021

In function nvmet_max_nsid() we calculate the max nsid by iterating
over the XArray and store it in the variable nsid that has type of
unsigned long.

Since the value of this function is stored into the subsys->max_nsid
which is of type u32, change the local variable nsid type and the return
type of the same function to u32.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

245067e3

nvmet: use u32 for nvmet_subsys max_nsid · 86693c43

由 Chaitanya Kulkarni 提交于 6月 13, 2021

Use u32 type for the nsid_max member of the nvmet_subsys structure.
This avoids the type confusion when updating the subsys->nax_nsid from
ns->nsid. This also matches the nvmet_ns->nsid member.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

86693c43

nvmet: use req->cmd directly in file-ns fast path · f3dce2ad

由 Chaitanya Kulkarni 提交于 6月 13, 2021

The function nvmet_file_parse_io_cmd() is called from the fast path. The
local variable to that function cmd is only used once.

Remove the local variable and use req->cmd directly.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f3dce2ad

nvmet: use req->cmd directly in bdev-ns fast path · 46eca470

由 Chaitanya Kulkarni 提交于 6月 13, 2021

The function nvmet_bdev_parse_io_cmd() is called from the fast path.
The local variable to that function cmd is only used once.

Remove the local variable and use req->cmd directly.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

46eca470

nvmet: make ver stable once connection established · 87fd4cc1

由 Noam Gottlieb 提交于 6月 07, 2021

Once some host has connected to the nvmf target, make sure that the
version number is stable and cannot be changed.
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NNoam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

87fd4cc1

nvmet: allow mn change if subsys not discovered · 0d148efd

由 Noam Gottlieb 提交于 6月 07, 2021

Currently, once the subsystem's model_number is set for the first time
there is no way to change it. However, as long as no connection was
established to nvmf target, there is no reason for such restriction and
we should allow to change the subsystem's model_number as many times as
needed.

In addition, in order to simplfy the changes and make the model number
flow more similar to the rest of the attributes in the Identify
Controller data structure, we set a default value for the model number
at the initiation of the subsystem.
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NNoam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0d148efd

nvmet: make sn stable once connection was established · 7ae023c5

由 Noam Gottlieb 提交于 6月 07, 2021

Once some host has connected to the target, make sure that the serial
number is stable and cannot be changed.
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NNoam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7ae023c5

nvmet: change sn size and check validity · e13b0615

由 Noam Gottlieb 提交于 6月 07, 2021

According to the NVM specification, the serial_number should be 20 bytes
(bytes 23:04 of the Identify Controller data structure), and should
contain only ASCII characters.

In accordance, the serial_number size is changed to 20 bytes and before
any attempt to store a new value in serial_number we check that the
input is valid - i.e. contains only ASCII characters, is not empty and
does not exceed 20 bytes.
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NNoam Gottlieb <ngottlieb@nvidia.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e13b0615

nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst() · 2a4a910a

由 Hannes Reinecke 提交于 5月 25, 2021

When parsing a request in nvmet_fc_handle_fcp_rqst() we should not
check for invalid target ports; if we do the command is aborted
from the fcp layer, causing the host to assume a transport error.
Rather we should still forward this request to the nvmet layer, which
will then correctly fail the command with an appropriate error status.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2a4a910a

nvme-fabrics: remove memset in connect io q · eff4423e

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

eff4423e

nvme-fabrics: remove memset in connect admin q · bfa9d122

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfa9d122

nvme-fabrics: remove memset in nvmf_reg_write32() · c22c2720

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c22c2720

nvme-fabrics: remove memset in nvmf_reg_read64() · 2796a8e4

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2796a8e4

nvme-tcp: use ctrl sgl check helper · 3b54064f

由 Chaitanya Kulkarni 提交于 6月 09, 2021

Use the helper to check NVMe controller's SGL support.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3b54064f

nvme-pci: use ctrl sgl check helper · 253a0b76

由 Chaitanya Kulkarni 提交于 6月 09, 2021

Use the helper to check NVMe controller's SGL support.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

253a0b76

nvme-fc: use ctrl sgl check helper · b61678bc

由 Chaitanya Kulkarni 提交于 6月 09, 2021

Use the helper to check NVMe controller's SGL support.
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b61678bc

nvme: add a helper to check ctrl sgl support · 73eefc27

由 Chaitanya Kulkarni 提交于 6月 09, 2021

For various transports such as fc/tcp/pci it is common to check if
NVMe SGLs are supported or not by the controller.

In this preparation patch we add a helper to avoid the open coding of
such checks in the various transport.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

73eefc27

nvme-pci: remove trailing lines for helpers · cb1b10e7

由 Chaitanya Kulkarni 提交于 6月 07, 2021

Remove the extra white line at the end of the functions.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cb1b10e7

nvme-pci: fix var. type for increasing cq_head · a0aac973

由 JK Kim 提交于 6月 17, 2021

nvmeq->cq_head is compared with nvmeq->q_depth and changed the value
and cq_phase for handling the next cq db.

but, nvmeq->q_depth's type is u32 and max. value is 0x10000 when
CQP.MSQE is 0xffff and io_queue_depth is 0x10000.

current temp. variable for comparing with nvmeq->q_depth is overflowed
when previous nvmeq->cq_head is 0xffff.

in this case, nvmeq->cq_phase is not updated.
so, fix data type for temp. variable to u32.
Signed-off-by: NJK Kim <jongkang.kim2@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a0aac973

16 6月, 2021 9 次提交

nvme-tcp: fix error codes in nvme_tcp_setup_ctrl() · 522af60c

由 Dan Carpenter 提交于 6月 05, 2021

These error paths currently return success but they should return
-EOPNOTSUPP.

Fixes: 73ffcefc ("nvme-tcp: check sgl supported by target")
Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

522af60c

nvme: factor out a nvme_validate_passthru_nsid helper · e7d4b549

由 Chaitanya Kulkarni 提交于 6月 07, 2021

Add a helper nvme_validate_passthru_nsid() to validate the nsid that
removes the nsid validation and error message print code from
nvme_user_cmd() and nvme_user_cmd64().
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e7d4b549

nvme: fix grammar in the CONFIG_NVME_MULTIPATH kconfig help text · d399742c

由 Geert Uytterhoeven 提交于 6月 14, 2021

Fix a singular/plural mismatch in the CONFIG_NVME_MULTIPATH help text.
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d399742c

nvme: remove superfluous bio_set_dev in nvme_requeue_work · 24114241

由 Daniel Wagner 提交于 6月 07, 2021

Commit ce86dad2 ("nvme-multipath: reset bdev to ns head when
failover") moved the reset code where the bio is added to the
requeue_list for the failover path. But it left the original
bio_set_dev in nvme_requeue_work.

There is a second path to nvme_requee_work. It is via
nvme_ns_head_submit_bio. Though we don't have to set bio->bi_bdev for
this path either, as it points to the correct bdev already.

Let's remove the bio_set_dev. It's updating the bio->bi_bdev with the
same pointer and thus it's unnecessary.
Signed-off-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

24114241

nvme: verify MNAN value if ANA is enabled · 120bb362

由 Daniel Wagner 提交于 6月 07, 2021

The controller is required to have a non-zero MNAN value if it supports
ANA:

   If the controller supports Asymmetric Namespace Access Reporting, then
   this field shall be set to a non-zero value that is less than or equal
   to the NN value.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

120bb362

ACPI: Add quirks for AMD Renoir/Lucienne CPUs to force the D3 hint · 6485fc18

由 Mario Limonciello 提交于 6月 09, 2021

AMD systems from Renoir and Lucienne require that the NVME controller
is put into D3 over a Modern Standby / suspend-to-idle
cycle.  This is "typically" accomplished using the `StorageD3Enable`
property in the _DSD, but this property was introduced after many
of these systems launched and most OEM systems don't have it in
their BIOS.

On AMD Renoir without these drives going into D3 over suspend-to-idle
the resume will fail with the NVME controller being reset and a trace
like this in the kernel logs:
```
[   83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting
[   83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting
[   83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting
[   83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting
[   95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller
[   95.332843] nvme nvme0: Abort status: 0x371
[   95.332852] nvme nvme0: Abort status: 0x371
[   95.332856] nvme nvme0: Abort status: 0x371
[   95.332859] nvme nvme0: Abort status: 0x371
[   95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16
[   95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16
```

The Microsoft documentation for StorageD3Enable mentioned that Windows has
a hardcoded allowlist for D3 support, which was used for these platforms.
Introduce quirks to hardcode them for Linux as well.

As this property is now "standardized", OEM systems using AMD Cezanne and
newer APU's have adopted this property, and quirks like this should not be
necessary.

CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com>
CC: Alexander Deucher <Alexander.Deucher@amd.com>
CC: Prike Liang <prike.liang@amd.com>
Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-introSigned-off-by: NMario Limonciello <mario.limonciello@amd.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NJulian Sikorski <belegdol@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6485fc18

ACPI: Check StorageD3Enable _DSD property in ACPI code · 2744d7a0

由 Mario Limonciello 提交于 6月 09, 2021

Although first implemented for NVME, this check may be usable by
other drivers as well. Microsoft's specification explicitly mentions
that is may be usable by SATA and AHCI devices. Google also indicates
that they have used this with SDHCI in a downstream kernel tree that
a user can plug a storage device into.

Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-introSuggested-by: NKeith Busch <kbusch@kernel.org>
CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com>
CC: Alexander Deucher <Alexander.Deucher@amd.com>
CC: Rafael J. Wysocki <rjw@rjwysocki.net>
CC: Prike Liang <prike.liang@amd.com>
Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2744d7a0

Merge branch 'md-next' of... · e0d245e2

由 Jens Axboe 提交于 6月 15, 2021

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.14/drivers

Pull MD changes from Song:

"1) iostats rewrite by Guoqing Jiang;
 2) raid5 lock contention optimization by Gal Ofri."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid5: avoid device_lock in read_one_chunk()
  md: add comments in md_integrity_register
  md: check level before create and exit io_acct_set
  md: Constify attribute_group structs
  md: mark some personalities as deprecated
  md/raid10: enable io accounting
  md/raid1: enable io accounting
  md/raid1: rename print_msg with r1bio_existed
  md/raid5: avoid redundant bio clone in raid5_read_one_chunk
  md/raid5: move checking badblock before clone bio in raid5_read_one_chunk
  md: add io accounting for raid0 and raid5
  md: revert io stats accounting

e0d245e2

Merge tag 'floppy-for-5.14' of https://github.com/evdenis/linux-floppy into for-5.14/drivers · 491e5b17

由 Jens Axboe 提交于 6月 15, 2021

Pull floppy fixes from Denis:

"Floppy patches for 5.14

 Two oneliners to fix clang warnings:
 - -Wimplicit-fallthrough warning fix from Gustavo A. R. Silva.
 - Redundant assignment warning fix from Jiapeng Chong.

 No semantic and behavioural changes."

* tag 'floppy-for-5.14' of https://github.com/evdenis/linux-floppy:
  floppy: Fix fall-through warning for Clang
  floppy: cleanup: remove redundant assignment to nr_sectors

491e5b17

15 6月, 2021 5 次提交

floppy: Fix fall-through warning for Clang · 2c9bdf6e

由 Gustavo A. R. Silva 提交于 5月 28, 2021

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Link: https://lore.kernel.org/linux-hardening/47bcd36a-6524-348b-e802-0691d1b3c429@kernel.dk/Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NDenis Efremov <efremov@linux.com>

2c9bdf6e

floppy: cleanup: remove redundant assignment to nr_sectors · 30ab5db7

由 Jiapeng Chong 提交于 4月 30, 2021

Variable nr_sectors is set to zero but this value is never
read as it is overwritten later on, hence it is a redundant
assignment and can be removed.

Clean up the following clang-analyzer warning:

drivers/block/floppy.c:2333:2: warning: Value stored to 'nr_sectors' is
never read [clang-analyzer-deadcode.DeadStores].

Link: https://lore.kernel.org/r/1619774805-121562-1-git-send-email-jiapeng.chong@linux.alibaba.comReported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: NDenis Efremov <efremov@linux.com>

30ab5db7

md/raid5: avoid device_lock in read_one_chunk() · 97ae2725

由 Gal Ofri 提交于 6月 07, 2021

There is a lock contention on device_lock in read_one_chunk().
device_lock is taken to sync conf->active_aligned_reads and
conf->quiesce.
read_one_chunk() takes the lock, then waits for quiesce=0 (resumed)
before incrementing active_aligned_reads.
raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits
for active_aligned_reads to be zero before setting quiesce=1
(suspended).

Introduce a fast (lockless) path in read_one_chunk(): activate aligned
read without taking device_lock.  In case quiesce starts while
activating the aligned-read in fast path, deactivate it and revert to
old behavior (take device_lock and wait for quiesce to finish).

Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to
gaurantee that read_one_chunk() does not miss an ongoing quiesce.

My setups:
1. 8 local nvme drives (each up to 250k iops).
2. 8 ram disks (brd).

Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per
socket) system. Record both iops and cpu spent on this contention with
rand-read-4k. Record bw with sequential-read-128k.  Note: in most cases
cpu is still busy but due to "new" bottlenecks.

nvme:
              | iops           | cpu  | bw
-----------------------------------------------
without patch | 1.6M           | ~50% | 5.5GB/s
with patch    | 2M (throttled) | 0%   | 16GB/s (throttled)

ram (brd):
              | iops           | cpu  | bw
-----------------------------------------------
without patch | 2M             | ~80% | 24GB/s
with patch    | 4M             | 0%   | 55GB/s

CC: Song Liu <song@kernel.org>
CC: Neil Brown <neilb@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NGal Ofri <gal.ofri@storing.io>
Signed-off-by: NSong Liu <song@kernel.org>

97ae2725

md: add comments in md_integrity_register · de3ea66e

由 Guoqing Jiang 提交于 6月 03, 2021

Given it is not obvious for the error handling, let's try to add some
comments here to make it clear.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

de3ea66e

md: check level before create and exit io_acct_set · daee2024

由 Guoqing Jiang 提交于 6月 03, 2021

The bio_set (io_acct_set) is used by personalities to clone bio and
trace the timestamp of bio. Some personalities such as raid1/10 don't
need the bio_set, so add check to not create it unconditionally.

Also update the comment for md_account_bio to make it more clear.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>

daee2024

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功