- 17 6月, 2021 3 次提交
-
-
由 Chaitanya Kulkarni 提交于
NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol-based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend support for non-ZNS zoned block devices as NVMeOF targets. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With the new command set identifier, we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
In current code there exists two backends which are using inline bio optimization, that adds a duplicate code for freeing the bio. For Zoned Block Device backend we also use the same optimzation and it will lead to having duplicate code in the three backends: generic bdev, passsthru, and generic zns. Add a helper function to avoid duplicate code and update the respective backends. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
The function nvmet_bdev_parse_io_cmd() is called from the fast path. The local variable to that function cmd is only used once. Remove the local variable and use req->cmd directly. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 03 6月, 2021 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
Remove the superfluous variable "bdev" that is only used once in the nvmet_bdev_alloc_bip() and use req->ns->bdev that is used everywhere in the code to access the nvmet request's bdev. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 5月, 2021 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
When handling rw commands, for inline bio case we only consider transfer size. This works well when req->sg_cnt fits into the req->inline_bvec, but it will result in the warning in __bio_add_page() when req->sg_cnt > NVMET_MAX_INLINE_BVEC. Consider an I/O size 32768 and first page is not aligned to the page boundary, then I/O is split in following manner :- [ 2206.256140] nvmet: sg->length 3440 sg->offset 656 [ 2206.256144] nvmet: sg->length 4096 sg->offset 0 [ 2206.256148] nvmet: sg->length 4096 sg->offset 0 [ 2206.256152] nvmet: sg->length 4096 sg->offset 0 [ 2206.256155] nvmet: sg->length 4096 sg->offset 0 [ 2206.256159] nvmet: sg->length 4096 sg->offset 0 [ 2206.256163] nvmet: sg->length 4096 sg->offset 0 [ 2206.256166] nvmet: sg->length 4096 sg->offset 0 [ 2206.256170] nvmet: sg->length 656 sg->offset 0 Now the req->transfer_size == NVMET_MAX_INLINE_DATA_LEN i.e. 32768, but the req->sg_cnt is (9) > NVMET_MAX_INLINE_BIOVEC which is (8). This will result in the following warning message :- nvmet_bdev_execute_rw() bio_add_page() __bio_add_page() WARN_ON_ONCE(bio_full(bio, len)); This scenario is very hard to reproduce on the nvme-loop transport only with rw commands issued with the passthru IOCTL interface from the host application and the data buffer is allocated with the malloc() and not the posix_memalign(). Fixes: 73383adf ("nvmet: don't split large I/Os unconditionally") Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 27 2月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 2月, 2021 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
In the NVMeOF block device backend, file backend, and passthru backend we reject and report the commands if opcode is not handled. Add an helper and use it in block device backend to keep the code and error message uniform. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 02 2月, 2021 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
In this preparation patch, we add helpers to convert lbas to sectors & sectors to lba. This is needed to eliminate code duplication in the ZBD backend. Use these helpers in the block device backend. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 28 1月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no point in allocating memory for a synchronous flush. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 27 5月, 2020 4 次提交
-
-
由 Israel Rukshin 提交于
Allocate the metadata SGL buffers and set metadata fields for the request. Then create a block IO request for the metadata from the protection SG list. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
The function doesn't check only the data length, because the transfer length includes also the metadata length in some cases. This is preparation for adding metadata (T10-PI) support. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
The function doesn't add the metadata length (only data length is calculated). This is preparation for adding metadata (T10-PI) support. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
Fill those namespace fields from the block device format for adding metadata (T10-PI) over fabric support with block devices. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 22 5月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 5月, 2020 1 次提交
-
-
由 Anthony Iliopoulos 提交于
Add support for detecting capacity changes on nvmet blockdev and file backed namespaces. This allows for emulating and testing online resizing of nvme devices and filesystems on top. Signed-off-by: NAnthony Iliopoulos <ailiop@suse.com> [chaitanya: Fix comments posted on V1] Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: reuse code a bit more] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 2月, 2020 1 次提交
-
-
由 Sagi Grimberg 提交于
The host is allowed to pass the controller an sgl describing a buffer that is larger than the dsm payload itself, allow it when executing dsm. Reported-by: NDakshaja Uppalapati <dakshaja@chelsio.com> Reviewed-by: Christoph Hellwig <hch@lst.de>, Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 05 11月, 2019 4 次提交
-
-
由 Christoph Hellwig 提交于
bio_set_op_attrs has been long deprecated, replace it with a direct assignment of the flags to bio->bi_opf. Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
With reference to the following issue reported on the mailing list :- http://lists.infradead.org/pipermail/linux-nvme/2019-October/027604.html this patch adds plugging for the bdev-ns under nvmet_bdev_execute_rw(). We can see the following performance improvement in random write workload I/Os with the setup described in the link when device_path configured as /dev/md0. Without this patch :- write: IOPS=40.8k, BW=159MiB/s (167MB/s)(4777MiB/30002msec) write: IOPS=41.2k, BW=161MiB/s (169MB/s)(4831MiB/30011msec) slat (usec): min=8, max=10823, avg=15.64, stdev=16.85 slat (usec): min=8, max=401, avg=15.40, stdev= 9.56 clat (usec): min=54, max=2492, avg=759.07, stdev=172.62 clat (usec): min=56, max=1997, avg=768.06, stdev=178.72 With this patch :- write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30011msec) write: IOPS=123k, BW=481MiB/s (504MB/s)(14.1GiB/30002msec) slat (usec): min=8, max=9941, avg=13.31, stdev= 8.04 slat (usec): min=8, max=289, avg=13.31, stdev= 3.37 clat (usec): min=43, max=17635, avg=245.46, stdev=171.23 clat (usec): min=44, max=17751, avg=245.25, stdev=183.14 Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Instead of storing the expected length and checking it when it's executed, just check the length inside the command themselves. A new helper, nvmet_check_data_len() is created to help with this check. Signed-off-by: NChristoph Hellwig <hch@lst.de> [split patch, udpate changelog] Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Israel Rukshin 提交于
This commit doesn't change any logic. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 9月, 2019 1 次提交
-
-
由 John Pittman 提交于
In nvmet_bdev_set_limits() the number of logical blocks per physical block is calculated, but the opposite is mentioned in the associated comment and reflected in the variable name. Correct the comment and adjust the variable name to reflect the calculation done. Signed-off-by: NJohn Pittman <jpittman@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 10 7月, 2019 1 次提交
-
-
由 Bart Van Assche 提交于
Make the NVMe NAWUN, NAWUPF, NACWU, NPWG, NPWA, NPDG and NOWS attributes available to initator systems for the block backend. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 05 6月, 2019 1 次提交
-
-
由 Minwoo Im 提交于
The WRITE ZEROES command has no data transfer so that we need to initialize the struct (nvmet_req *req)->data_len to 0x0. While (nvmet_req *req)->transfer_len is initialized in nvmet_req_init(), data_len will be initialized by nowhere which might cause the failure with status code NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR randomly. It's because nvmet_req_execute() checks like: if (unlikely(req->data_len != req->transfer_len)) { req->error_loc = offsetof(struct nvme_common_command, dptr); nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR); } else req->execute(req); This patch fixes req->data_len not to be a randomly assigned by initializing it to 0x0 when preparing the command in nvmet_bdev_parse_io_cmd(). nvmet_file_parse_io_cmd() which is for file-backed I/O has already initialized the data_len field to 0x0, though. Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 05 4月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Use errno_to_nvme_status to convert from a negative errno to a nvme status field instead of going through a blk_status_t. Also remove the pointless status variable in nvmet_bdev_execute_write_zeroes. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
- 14 3月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
NVMe DSM is a pure hint, so if the underlying device / file system does not support discard-like operations we should not fail the operation but rather return success. Fixes: 3b031d15 ("nvmet: add error log support for bdev backend") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 2月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 13 12月, 2018 2 次提交
-
-
由 Chaitanya Kulkarni 提交于
This patch adds the support for the block device backend to populate the error log entries. Here we map the blk_status_t to the NVMe status. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 26 11月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
It doesn't set HIPRI on the bio, so polling for it is pretty silly. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2018 1 次提交
-
-
由 Logan Gunthorpe 提交于
Create a configfs attribute in each nvme-fabrics namespace to enable P2P memory use. The attribute may be enabled (with a boolean) or a specific P2P device may be given (with the device's PCI name). When enabled, the namespace will ensure the underlying block device supports P2P and is compatible with any specified P2P device. If no device was specified it will ensure there is compatible P2P memory somewhere in the system. Enabling a namespace with P2P memory will fail with EINVAL (and an appropriate dmesg error) if any of these conditions are not met. Once a controller is set up on a specific port, the P2P device to use for each namespace will be found and stored in a radix tree by namespace ID. When memory is allocated for a request, the tree is used to look up the P2P device to allocate memory against. If no device is in the tree (because no appropriate device was found), or if allocation of P2P memory fails, fall back to using regular memory. Signed-off-by: NStephen Bates <sbates@raithlin.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> [hch: partial rewrite of the initial code] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 02 10月, 2018 1 次提交
-
-
由 Sagi Grimberg 提交于
If we know that the I/O size exceeds our inline bio vec, no point using it and split the rest to begin with. We could in theory reuse the inline bio and only allocate the bio_vec, but its really not worth optimizing for. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 08 8月, 2018 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
This patch implements the Namespace Write Protect feature described in "NVMe TP 4005a Namespace Write Protect". In this version, we implement No Write Protect and Write Protect states for target ns which can be toggled by set-features commands from the host side. For write-protect state transition, we need to flush the ns specified as a part of command so we also add helpers for carrying out synchronous flush operations. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: fixed an incorrect endianess conversion, minor cleanups] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 25 5月, 2018 2 次提交
-
-
由 Chaitanya Kulkarni 提交于
This patch adds simple file backed namespace support for NVMeOF target. The new file io-cmd-file.c is responsible for handling the code for I/O commands when ns is file backed. Also, we introduce mempools based slow path using sync I/Os for file backed ns to ensure forward progress under reclaim. The old block device based implementation is moved to io-cmd-bdev.c and use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also move into the respective files. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: updated changelog, fixed double req->ns lookup in bdev case] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Remove the duplicate NULL initialization for req->ns. req->ns is always initialized to NULL in nvmet_req_init(), so there is no need to reset it later on failures unless we have previously assigned a value to it. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 4月, 2018 1 次提交
-
-
由 Rodrigo R. Galvao 提交于
We have to increment the number of logical blocks to a 1's based value in the native format prior to converting to 512b units. Signed-off-by: NRodrigo R. Galvao <rosattig@linux.vnet.ibm.com> [changelog] Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 2月, 2018 1 次提交
-
-
由 Israel Rukshin 提交于
Execute discard command on block device that doesn't support it should return success. Returning internal error while using multi-path fails the path. Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 11 11月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Much easier to just opencode this helper. Also use ARRAY_SIZE instead of passing the inline bvec array size manually. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@rimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 11月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
That we we can also poll non blk-mq queues. Mostly needed for the NVMe multipath code, but could also be useful elsewhere. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 8月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-