- 30 7月, 2018 2 次提交
-
-
由 Max Gurtovoy 提交于
Currently these functions are implemented in the scsi layer, but their actual place should be the block layer since T10-PI is a general data integrity feature that is used in the nvme protocol as well. Also, use the tuple size from the integrity profile since it may vary between integrity types. Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Max Gurtovoy 提交于
Currently this function is implemented in the scsi layer, but it's actual place should be the block layer since T10-PI is a general data integrity feature that is used in the nvme protocol as well. Suggested-by: NChristoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 7月, 2018 1 次提交
-
-
由 Jinbum Park 提交于
User controls @dev_minor which to be used as index of pkt_devs. So, It can be exploited via Spectre-like attack. (speculative execution) This kind of attack leaks address of pkt_devs, [1] It leads an attacker to bypass security mechanism such as KASLR. So sanitize @dev_minor before using it to prevent attack. [1] https://github.com/jinb-park/linux-exploit/ tree/master/exploit-remaining-spectre-gadget/leak_pkt_devs.c Signed-off-by: NJinbum Park <jinb.park7@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 7月, 2018 10 次提交
-
-
由 Arnd Bergmann 提交于
The get_seconds function is deprecated now since it returns a 32-bit value that will eventually overflow, and we are replacing it throughout the kernel with ktime_get_seconds() or ktime_get_real_seconds() that return a time64_t. bcache uses get_seconds() to read the current system time and store it in the superblock as well as in uuid_entry structures that are user visible. Unfortunately, the two structures in are still limited to 32 bits, so this won't fix any real problems but will still overflow in year 2106. Let's at least document that properly, in case we get an updated format in the future it can be fixed. We still have a long time before the overflow and checking the tools at https://github.com/koverstreet/bcache-tools reveals no access to any of them. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Florian Schmaus 提交于
Fixes an error condition reported by checkpatch.pl which is caused by assigning a variable in an if condition. Signed-off-by: NFlorian Schmaus <flo@geekplace.eu> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Florian Schmaus 提交于
Fixes an error condition reported by checkpatch.pl which is caused by assigning a variable in an if condition. Signed-off-by: NFlorian Schmaus <flo@geekplace.eu> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
Free the cache_set->flush_bree heap memory on journal free. Signed-off-by: NWang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Florian Schmaus 提交于
Fixes an error condition reported by checkpatch.pl which is caused by assigning a variable in an if condition. Signed-off-by: NFlorian Schmaus <flo@geekplace.eu> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
I attached several backend devices in the same cache set, and produced lots of dirty data by running small rand I/O writes in a long time, then I continue run I/O in the others cached devices, and stopped a cached device, after a mean while, I register the stopped device again, I see the running I/O in the others cached devices dropped significantly, sometimes even jumps to zero. In currently code, bcache would traverse each keys and btree node to count the dirty data under read locker, and the writes threads can not get the btree write locker, and when there is a lot of keys and btree node in the registering device, it would last several seconds, so the write I/Os in others cached device are blocked and declined significantly. In this patch, when a device registering to a ache set, which exist others cached devices with running I/Os, we get the amount of dirty data of the device in an incremental way, and do not block other cached devices all the time. Patch v2: Rename some variables and macros name as Coly suggested. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
This patch base on "[PATCH] bcache: finish incremental GC". Since incremental GC would stop 100ms when front side I/O comes, so when there are many btree nodes, if GC only processes constant (100) nodes each time, GC would last a long time, and the front I/Os would run out of the buckets (since no new bucket can be allocated during GC), and I/Os be blocked again. So GC should not process constant nodes, but varied nodes according to the number of btree nodes. In this patch, GC is divided into constant (100) times, so when there are many btree nodes, GC can process more nodes each time, otherwise GC will process less nodes each time (but no less than MIN_GC_NODES). Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
In GC thread, we record the latest GC key in gc_done, which is expected to be used for incremental GC, but in currently code, we didn't realize it. When GC runs, front side IO would be blocked until the GC over, it would be a long time if there is a lot of btree nodes. This patch realizes incremental GC, the main ideal is that, when there are front side I/Os, after GC some nodes (100), we stop GC, release locker of the btree node, and go to process the front side I/Os for some times (100 ms), then go back to GC again. By this patch, when we doing GC, I/Os are not blocked all the time, and there is no obvious I/Os zero jump problem any more. Patch v2: Rename some variables and macros name as Coly suggested. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
Currently we calculate the total amount of flash only devices dirty data by adding the dirty data of each flash only device under registering locker. It is very inefficient. In this patch, we add a member flash_dev_dirty_sectors in struct cache_set to record the total amount of flash only devices dirty data in real time, so we didn't need to calculate the total amount of dirty data any more. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Greg Edwards 提交于
When the underlying device is a 4 KiB logical block size device with a protection interval exponent of 0, i.e. 4096 bytes data + 8 bytes PI, the driver miscalculates the pi_bytes{out,in} by a factor of 8x (64 bytes). This leads to errors on all reads and writes on 4 KiB logical block size devices when CONFIG_BLK_DEV_INTEGRITY is enabled and the VIRTIO_SCSI_F_T10_PI feature bit has been negotiated. Fixes: e6dc783a ("virtio-scsi: Enable DIF/DIX modes in SCSI host LLD") Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NGreg Edwards <gedwards@ddn.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 7月, 2018 3 次提交
-
-
由 Juergen Gross 提交于
Remove some macros not used anywhere. Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The function name mentioned doesn't exist, and the code next to it doesn't match the description either. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
We immediately overwrite the biovec array, so instead just allocate a new bio and copy over the disk, setor and size. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Acked-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 7月, 2018 10 次提交
-
-
由 Bart Van Assche 提交于
Commit ca4b2a01 ("null_blk: add zone support") breaks several blktests scripts because it renamed the null_blk kernel module into null_blk_mod. Hence rename null_blk_mod back into null_blk. Fixes: ca4b2a01 ("null_blk: add zone support") Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Matias Bjorling <matias.bjorling@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Andy Shevchenko 提交于
Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le. Since both arguments are u8 [16], use size of destination there. Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t already. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Fail out-of-bounds with a proper status code. Fixes: d5eff33e ("nvmet: add simple file backed ns support") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If nvmet_copy_from_sgl failed, we falsly return successful completion status. Fixes: d5eff33e ("nvmet: add simple file backed ns support") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We follow the same queue teardown sequence in delete, reset and error recovery. Centralize the logic. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Centralize controller sequence to a single routine that correctly cleans up after failures instead of having multiple apperances in several flows (create, reset, reconnect). One thing that we also gain here are the sanity/boundary checks also when connecting back to a dynamic controller. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If the controller is going away, we need to unquiesce the IO queues so that all pending request can fail gracefully before moving forward with controller deletion. Do that before we destroy the IO queues so blk_cleanup_queue won't block in freeze. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
This will print the disk name to the nvme event trace for io requests so a user can better distinguish traffic to different disks. This can be used to create disk based filters. For example, to see only nvme0n2 traffic: echo "disk == \"nvme0n2\"" > /sys/kernel/debug/tracing/events/nvme/filter Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> [hch: turned __assign_disk_name into an inline function] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
This appends the controller instance to the nvme trace buffer to distinguish which controller is dispatching and completing a command. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 23 7月, 2018 9 次提交
-
-
由 Keith Busch 提交于
We can not match a command to its completion based on the command id alone. We need the submitting queue identifier to pair with the completion, so this patch adds that to the trace buffer. This patch is also collapsing the admin and IO submission traces into a single one so we don't need to duplicate this and creating unnecessary code branches: we know if the command is an admin vs IO based on the qid. And since we're here, the patch fixes code formatting in the area. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> [hch: move the qid helper to nvme.h and made it an inline function] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We will need to reference the controller in the setup and completion time for tracing and future traffic based keep alive support. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
Posting receive buffer operation can fail, thus we should make sure to have an error flow during initialization phase. While we're here, add a debug print in case of a failure. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
ib_post_send operation should succeed unless something unusual happened to the ib device. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Steve Wise 提交于
The patch enables inline data sizes using up to 4 recv sges, and capping the size at 16KB or at least 1 page size. So on a 4K page system, up to 16KB is supported, and for a 64K page system 1 page of 64KB is supported. We avoid > 0 order page allocations for the inline buffers by using multiple recv sges, one for each page. If the device cannot support the configured inline data size due to lack of enough recv sges, then log a warning and reduce the inline size. Add a new configfs port attribute, called param_inline_data_size, to allow configuring the size of inline data for a given nvmf port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes for small page systems. If the configured inline data size exceeds NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is reduced. If param_inline_data_size is set to 0, then inline data is disabled for that nvmf port. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Steve Wise 提交于
Allow up to 4 segments of inline data for NVMF WRITE operations. This reduces latency for small WRITEs by removing the need for the target to issue a READ WR for IB, or a REG_MR + READ WR chain for iWarp. Also cap the inline segments used based on the limitations of the device. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Add a new "buffered_io" attribute, which disabled direct I/O and thus enables page cache based caching when enabled. The attribute can only be changed when the namespace is disabled as the file has to be reopend for the change to take effect. The possibly blocking read/write are deferred to a newly introduced global workqueue. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
This patch adds support for Commands Supported and Effects log page (Log Identifier 05h) for NVMeOF. This also makes it easier to find which commands are supported, e.g. :- subnqn : testnqn1 Admin Command Set ACS2 [Get Log Page ] 00000001 ACS6 [Identify ] 00000001 ACS8 [Abort ] 00000001 ACS9 [Set Features ] 00000001 ACS10 [Get Features ] 00000001 ACS12 [Asynchronous Event Request ] 00000001 ACS24 [Keep Alive ] 00000001 NVM Command Set IOCS0 [Flush ] 00000001 IOCS1 [Write ] 00000001 IOCS2 [Read ] 00000001 IOCS8 [Write Zeroes ] 00000001 IOCS9 [Dataset Management ] 00000001 This partticular functionality can be used from the host side to examine the NVMeOF ctrl commands supported. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Currently, the code initializes the keep alive work item whenever nvme_start_keep_alive() is called. However, this routine is called several times while reconnecting, etc. Although it's hoped that keep alive is always disabled and not scheduled when start is called, re-initing if it were scheduled or completing can have very bad side effects. There's no need for re-initialization. Move the keep_alive work item and cmd struct initialization to controller init. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 18 7月, 2018 4 次提交
-
-
由 Michael Callahan 提交于
Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Callahan 提交于
Add a part_stat_read_accum macro to genhd.h to read and sum across field entries. For example to sum up the number read and write sectors completed. In addition to being ar reasonable cleanup by itself this will make it easier to add new stat fields in the future. tj: Refreshed on top of v4.17. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @OP with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 RAGHU Halharvi 提交于
* Remove checkpatch errors caused due to assignment operation in if condition Signed-off-by: NRAGHU Halharvi <raghuhack78@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 7月, 2018 1 次提交
-
-
由 Hans Holmberg 提交于
We can't know if a block is closed or not on 1.2 devices, so assume closed state to make sure that blocks are erased before writing. Fixes: 32ef9412 ("lightnvm: pblk: implement get log report chunk") Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-