- 24 7月, 2018 8 次提交
-
-
由 Sagi Grimberg 提交于
Fail out-of-bounds with a proper status code. Fixes: d5eff33e ("nvmet: add simple file backed ns support") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If nvmet_copy_from_sgl failed, we falsly return successful completion status. Fixes: d5eff33e ("nvmet: add simple file backed ns support") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We follow the same queue teardown sequence in delete, reset and error recovery. Centralize the logic. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Centralize controller sequence to a single routine that correctly cleans up after failures instead of having multiple apperances in several flows (create, reset, reconnect). One thing that we also gain here are the sanity/boundary checks also when connecting back to a dynamic controller. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If the controller is going away, we need to unquiesce the IO queues so that all pending request can fail gracefully before moving forward with controller deletion. Do that before we destroy the IO queues so blk_cleanup_queue won't block in freeze. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
This will print the disk name to the nvme event trace for io requests so a user can better distinguish traffic to different disks. This can be used to create disk based filters. For example, to see only nvme0n2 traffic: echo "disk == \"nvme0n2\"" > /sys/kernel/debug/tracing/events/nvme/filter Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> [hch: turned __assign_disk_name into an inline function] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
This appends the controller instance to the nvme trace buffer to distinguish which controller is dispatching and completing a command. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 23 7月, 2018 12 次提交
-
-
由 Keith Busch 提交于
We can not match a command to its completion based on the command id alone. We need the submitting queue identifier to pair with the completion, so this patch adds that to the trace buffer. This patch is also collapsing the admin and IO submission traces into a single one so we don't need to duplicate this and creating unnecessary code branches: we know if the command is an admin vs IO based on the qid. And since we're here, the patch fixes code formatting in the area. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> [hch: move the qid helper to nvme.h and made it an inline function] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We will need to reference the controller in the setup and completion time for tracing and future traffic based keep alive support. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
Posting receive buffer operation can fail, thus we should make sure to have an error flow during initialization phase. While we're here, add a debug print in case of a failure. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
ib_post_send operation should succeed unless something unusual happened to the ib device. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Steve Wise 提交于
The patch enables inline data sizes using up to 4 recv sges, and capping the size at 16KB or at least 1 page size. So on a 4K page system, up to 16KB is supported, and for a 64K page system 1 page of 64KB is supported. We avoid > 0 order page allocations for the inline buffers by using multiple recv sges, one for each page. If the device cannot support the configured inline data size due to lack of enough recv sges, then log a warning and reduce the inline size. Add a new configfs port attribute, called param_inline_data_size, to allow configuring the size of inline data for a given nvmf port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes for small page systems. If the configured inline data size exceeds NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is reduced. If param_inline_data_size is set to 0, then inline data is disabled for that nvmf port. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Steve Wise 提交于
Allow up to 4 segments of inline data for NVMF WRITE operations. This reduces latency for small WRITEs by removing the need for the target to issue a READ WR for IB, or a REG_MR + READ WR chain for iWarp. Also cap the inline segments used based on the limitations of the device. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Add a new "buffered_io" attribute, which disabled direct I/O and thus enables page cache based caching when enabled. The attribute can only be changed when the namespace is disabled as the file has to be reopend for the change to take effect. The possibly blocking read/write are deferred to a newly introduced global workqueue. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
This patch adds support for Commands Supported and Effects log page (Log Identifier 05h) for NVMeOF. This also makes it easier to find which commands are supported, e.g. :- subnqn : testnqn1 Admin Command Set ACS2 [Get Log Page ] 00000001 ACS6 [Identify ] 00000001 ACS8 [Abort ] 00000001 ACS9 [Set Features ] 00000001 ACS10 [Get Features ] 00000001 ACS12 [Asynchronous Event Request ] 00000001 ACS24 [Keep Alive ] 00000001 NVM Command Set IOCS0 [Flush ] 00000001 IOCS1 [Write ] 00000001 IOCS2 [Read ] 00000001 IOCS8 [Write Zeroes ] 00000001 IOCS9 [Dataset Management ] 00000001 This partticular functionality can be used from the host side to examine the NVMeOF ctrl commands supported. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Currently, the code initializes the keep alive work item whenever nvme_start_keep_alive() is called. However, this routine is called several times while reconnecting, etc. Although it's hoped that keep alive is always disabled and not scheduled when start is called, re-initing if it were scheduled or completing can have very bad side effects. There's no need for re-initialization. Move the keep_alive work item and cmd struct initialization to controller init. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Revanth Rajashekar 提交于
Added some feature ids present in nvme-cli but not kernel. Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Ming Lei 提交于
Inside blk_mq_try_issue_list_directly(), if the request is issued as failed, we shouldn't try to do it again, otherwise the warning in blk_mq_start_request() will be triggered. This change is aligned to behaviour of other ways of request issue & dispatch. Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: LKP <lkp@01.org> Reported-by: Nkernel test robot <rong.a.chen@intel.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
With the change to use UINT_MAX I broke the depth check as any value of inflight (ie 0) would be less than (int)UINT_MAX. Fix this by changing everything to unsigned int to match the depth. Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 7月, 2018 8 次提交
-
-
由 Tejun Heo 提交于
Add tracking of REQ_OP_DISCARD ios to the per-cgroup io.stat. Two fields, dbytes and dios, to respectively count the total bytes and number of discards are added. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Cc: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Callahan 提交于
Add tracking of REQ_OP_DISCARD ios to the partition statistics and append them to the various stat files in /sys as well as /proc/diskstats. These are tracked with the same four stats as reads and writes: Number of discard ios completed. Number of discard ios merged Number of discard sectors completed Milliseconds spent on discard requests This is done via adding a new STAT_DISCARD define to genhd.h and then using it to index that stat field for discard requests. tj: Refreshed on top of v4.17 and other previous updates. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Callahan 提交于
Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Callahan 提交于
Add defines for STAT_READ and STAT_WRITE for indexing the partition stat entries. This clarifies some fs/ code which has hardcoded 1 for STAT_WRITE and will make it easier to extend the stats with additional fields. tj: Refreshed on top of v4.17. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Callahan 提交于
Add a part_stat_read_accum macro to genhd.h to read and sum across field entries. For example to sum up the number read and write sectors completed. In addition to being ar reasonable cleanup by itself this will make it easier to add new stat fields in the future. tj: Refreshed on top of v4.17. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @OP with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 RAGHU Halharvi 提交于
* Remove checkpatch errors caused due to assignment operation in if condition Signed-off-by: NRAGHU Halharvi <raghuhack78@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
In case of 'none' io scheduler, when hw queue isn't busy, it isn't necessary to enqueue request to sw queue and dequeue it from sw queue because request may be submitted to hw queue asap without extra cost, meantime there shouldn't be much request in sw queue, and we don't need to worry about effect on IO merge. There are still some single hw queue SCSI HBAs(HPSA, megaraid_sas, ...) which may connect high performance devices, so 'none' is often required for obtaining good performance. This patch improves IOPS and decreases CPU unilization on megaraid_sas, per Kashyap's test. Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: NKashyap Desai <kashyap.desai@broadcom.com> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 7月, 2018 2 次提交
-
-
由 Josef Bacik 提交于
In our longer tests we noticed that some boxes would degrade to the point of uselessness. This is because we truncate the current time when saving it in our bio, but I was using the raw current time to subtract from. So once the box had been up a certain amount of time it would appear as if our IO's were taking several years to complete. Fix this by truncating the current time so it matches the issue time. Verified this worked by running with this patch for a week on our test tier. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
Early versions of these patches had us waiting for seconds at a time during submission, so we had to adjust the timing window we monitored for latency. Now we don't do things like that so this is unnecessary code. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 7月, 2018 10 次提交
-
-
由 Hans Holmberg 提交于
We can't know if a block is closed or not on 1.2 devices, so assume closed state to make sure that blocks are erased before writing. Fixes: 32ef9412 ("lightnvm: pblk: implement get log report chunk") Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Heiner Litz 提交于
In the read path, partial reads are currently performed synchronously which affects performance for workloads that generate many partial reads. This patch adds an asynchronous partial read path as well as the required partial read ctx. Signed-off-by: NHeiner Litz <hlitz@ucsc.edu> Reviewed-by: NIgor Konopko <igor.j.konopko@intel.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Matias Bjørling 提交于
The error messages in pblk does not say which pblk instance that a message occurred from. Update each error message to reflect the instance it belongs to, and also prefix it with pblk, so we know the message comes from the pblk module. Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Matias Bjørling 提交于
For devices that does not specify a limit on its transfer size, the get_chk_meta command may send down a single I/O retrieving the full chunk metadata table. Resulting in large 2-4MB I/O requests. Instead, split up the I/Os to a maximum of 256KB and issue them separately to reduce memory requirements. Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Matias Bjørling 提交于
If using pblk on a 32bit architecture, and there is a need to perform a partial read, the partial read bitmap will only have allocated 32 entries, where as 64 are needed. Make sure that the read_bitmap is initialized to 64bits on 32bit architectures as well. Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Reviewed-by: NIgor Konopko <igor.j.konopko@intel.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Since both blk_old_get_request() and blk_mq_alloc_request() initialize rq->__data_len to zero, it is not necessary to initialize that member in nvme_nvm_alloc_request(). Hence remove the rq->__data_len initialization from nvme_nvm_alloc_request(). Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Matias Bjørling 提交于
When recovering a line, an extra check was added when debugging was active, such that minor version where also checked. Unfortunately, this used the ifdef NVM_DEBUG, which is not correct. Instead use the proper DEBUG def, and now that it compiles, also fix the variable. Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Fixes: d0ab0b1a ("lightnvm: pblk: check data lines version on recovery") Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Matias Bjørling 提交于
There is no users of CONFIG_NVM_DEBUG in the LightNVM subsystem. All users are in pblk. Rename NVM_DEBUG to NVM_PBLK_DEBUG and enable only for pblk. Also fix up the CONFIG_NVM_PBLK entry to follow the code style for Kconfig files. Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Marcin Dziegielewski 提交于
Some devices can expose mw_cunits equal to 0, it can cause the creation of too small write buffer and cause performance to drop on write workloads. Additionally, write buffer size must cover write data requirements, such as WS_MIN and MW_CUNITS - it must be greater than or equal to the larger one multiplied by the number of PUs. However, for performance reasons, use the WS_OPT value to calculation instead of WS_MIN. Because the place where buffer size is calculated was changed, this patch also removes pgs_in_buffer filed in pblk structure. Signed-off-by: NMarcin Dziegielewski <marcin.dziegielewski@intel.com> Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-