- 09 1月, 2018 6 次提交
-
-
由 Michael Lyle 提交于
Writeback keys are presently iterated and dispatched for writeback in order of the logical block address on the backing device. Multiple may be, in parallel, read from the cache device and then written back (especially when there are contiguous I/O). However-- there was no guarantee with the existing code that the writes would be issued in LBA order, as the reads from the cache device are often re-ordered. In turn, when writing back quickly, the backing disk often has to seek backwards-- this slows writeback and increases utilization. This patch introduces an ordering mechanism that guarantees that the original order of issue is maintained for the write portion of the I/O. Performance for writeback is significantly improved when there are multiple contiguous keys or high writeback rates. Signed-off-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Tested-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
in bch_debug_init(), ret is always 0, and the return value is useless, change it to return 0 if be success after calling debugfs_create_dir(), else return a non-zero value. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
In such scenario that there are some flash only volumes , and some cached devices, when many tasks request these devices in writeback mode, the write IOs may fall to the same bucket as bellow: | cached data | flash data | cached data | cached data| flash data| then after writeback of these cached devices, the bucket would be like bellow bucket: | free | flash data | free | free | flash data | So, there are many free space in this bucket, but since data of flash only volumes still exists, so this bucket cannot be reclaimable, which would cause waste of bucket space. In this patch, we segregate flash only volume write streams from cached devices, so data from flash only volumes and cached devices can store in different buckets. Compare to v1 patch, this patch do not add a additionally open bucket list, and it is try best to segregate flash only volume write streams from cached devices, sectors of flash only volumes may still be mixed with dirty sectors of cached device, but the number is very small. [mlyle: fixed commit log formatting, permissions, line endings] Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Vasyl Gomonovych 提交于
Fix ptr_ret.cocci warnings: drivers/md/bcache/btree.c:1800:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: NVasyl Gomonovych <gomonovych@gmail.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
Currently, when a cached device detaching from cache, writeback thread is not stopped, and writeback_rate_update work is not canceled. For example, after the following command: echo 1 >/sys/block/sdb/bcache/detach you can still see the writeback thread. Then you attach the device to the cache again, bcache will create another writeback thread, for example, after below command: echo ba0fb5cd-658a-4533-9806-6ce166d883b9 > /sys/block/sdb/bcache/attach then you will see 2 writeback threads. This patch stops writeback thread and cancels writeback_rate_update work when cached device detaching from cache. Compare with patch v1, this v2 patch moves code down into the register lock for safety in case of any future changes as Coly and Mike suggested. [edit by mlyle: commit log spelling/formatting] Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rui Hua 提交于
The read request might meet error when searching the btree, but the error was not handled in cache_lookup(), and this kind of metadata failure will not go into cached_dev_read_error(), finally, the upper layer will receive bi_status=0. In this patch we judge the metadata error by the return value of bch_btree_map_keys(), there are two potential paths give rise to the error: 1. Because the btree is not totally cached in memery, we maybe get error when read btree node from cache device (see bch_btree_node_get()), the likely errno is -EIO, -ENOMEM 2. When read miss happens, bch_btree_insert_check_key() will be called to insert a "replace_key" to btree(see cached_dev_cache_miss(), just for doing preparatory work before insert the missed data to cache device), a failure can also happen in this situation, the likely errno is -ENOMEM bch_btree_map_keys() will return MAP_DONE in normal scenario, but we will get either -EIO or -ENOMEM in above two cases. if this happened, we should NOT recover data from backing device (when cache device is dirty) because we don't know whether bkeys the read request covered are all clean. And after that happened, s->iop.status is still its initially value(0) before we submit s->bio.bio, we set it to BLK_STS_IOERR, so it can go into cached_dev_read_error(), and finally it can be passed to upper layer, or recovered by reread from backing device. [edit by mlyle: patch formatting, word-wrap, comment spelling, commit log format] Signed-off-by: NHua Rui <huarui.dev@gmail.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 1月, 2018 17 次提交
-
-
由 Israel Rukshin 提交于
There is a problem when another module (e.g. nvmet) takes a reference on the nvme block device and the physical nvme drive is removed. In that case nvme_free_ctrl() will not be called and the controller state will be "deleting" or "dead" unless nvmet module releases the block device. Later on, the same nvme drive probes back and nvme_init_subsystem() will be called and fail due to duplicate subnqn (if the nvme device doesn't support subsystem with multiple controllers). This will cause a probe failure. This commit changes the check of multiple controllers support at nvme_init_subsystem() by not counting all the controllers at "dead" or "deleting" state (this is safe because controllers at this state will never be active again). Fixes: ab9e00cc ("nvme: track subsystems") Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Nitzan Carmi 提交于
The block device is backed by the transport so we must ensure that the transport driver will not be removed until all references are released. Otherwise, we might end up referencing freed memory. Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Jianchao Wang 提交于
When the io queues setup or tagset allocation failed, ctrl.tagset is NULL. But the scan work will still be queued and executed, then panic comes up due to NULL pointer reference of ctrl.tagset. To fix this, add a new ctrl state NVME_CTRL_ADMIN_ONLY to inidcate only admin queue is live. When non io queues or tagset allocation failed, ctrl enters into this state, scan work will not be started. But async event work and nvme dev ioctl will be still available. This will be helpful to do further investigation and recovery. Suggested-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
When an NVMe controller reports RTD3 Entry Latency larger than the value of shutdown_timeout module parameter, we update the shutdown_timeout accordingly to honor RTD3 Entry Latency. Use an informational debug level instead of a warning level for it. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
Make it symmetric to nvmet_alloc_ctrl(). Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
Remove the allocated id on error. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Minwoo Im 提交于
The local variable __size__ will be set a bit later in a for-loop. Remove the explicit initialization at the beginning of this function. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Roy Shterman 提交于
NVMe transport driver module unload may (and usually does) trigger iteration over the active controllers and delete them all (sometimes under a mutex). However, a controller can be created concurrently with module unload which can lead to leakage of resources (most important char device node leakage) in case the controller creation occured after the unload delete and drain sequence. To protect against this, we take a module reference to guarantee that the nvme transport driver is not unloaded while creating a controller. Signed-off-by: NRoy Shterman <roys@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
The current fc transport add_port routine validates that there is a matching port to the target port config. It then takes a reference on the targetport. The del_port removes the reference. Unfortunately, if the LLDD undergoes a hw reset or driver unload and wants to unreg the targetport, due to the reference, the targetport effectively can't be removed. It requires the admin to remove the port from the nvmet config first, which calls the del_port. Note: it appears nvmetcli clear skips over the del_port call (I'm not attempting to change that). There's no real reason to take the reference. With FC, there is nothing to enable or disable as the presence of the FC targetport implicitly means its enabled, and removal of the targtport means its disabled. Change add_port to simply validate and change remove_port to a noop. No references are taken on the targetport. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
The split between what the host accesses on its flows vs what the target side accesses was flawed. Abort handling didn't properly clear initiator vs target structure cross-reference and locks weren't used for synchronization. Thus, there were issues of freeing structures too soon and access after free. A couple of these existed pre the IN_ISR mods, but when the target upcalls were converted to work items, thus adding delays between the 2 sides of accesses, the problems became pronounced. Resolve by: - tracking io state mainly in the tgt-side io structure. - make the tgt-side io structure released by reference not by code flow. - when changing initiator structures, use locks for synchronization - aborts are clearly tracked for which side saw the abort, and after seeing the abort, cross-references are cleared under lock. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
The existing fcloop driver expects the target side upcalls to the transport to context switch, thus the calls into the nvmet layer are not done in the calling context of the host/initiator down calls. The xxx_IN_ISR feature flags are used to select this logic. The xxx_IN_ISR feature flags should go away in the nvmet_fc transport as no other lldd utilizes them. Both Broadcom and Cavium lldds have their own non-ISR deferred handlers thus the nvmet calls can be made directly. This patch converts the paths that make the target upcalls (command receive, abort receive) such that they schedule a work item rather than expecting the transport to schedule the work item. The patch also cleans up the following: - The completion path from target to host scheduled a host work element called "work". Rename it "tio_done_work" for code clarity. - The abort io path called a iniwork item to call the host side io done. This is no longer needed as the abort routine can make the same call. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
The current fcloop driver gets its lport structure from the private area co-allocated with the fc_localport. All is fine except the teardown path, which wants to wait on the completion, which is marked complete by the delete_localport callback performed after unregister_localport. The issue is, the nvme_fc transport frees the localport structure immediately after delete_localport is called, meaning the original routine is trying to wait on a complete that was just freed. Change such that a lport struct is allocated coincident with the addition and registration of a localport. The private area of the localport now contains just a backpointer to the real lport struct. Now, the completion can be waited for, and after completing, the new structure can be kfree'd. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
A test case revealed a race condition of an i/o completing on a thread parallel to the delete_association generating the aborts for the outstanding ios on the controller. The i/o completion was freeing the target fcloop context, thus the abort task referenced the just-freed memory. Correct by clearing the target/initiator cross pointers in the io completion and abort tasks before calling the callbacks. On aborts that detect already finished io's, ensure the complete context is called. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
It is a bit chatty to report on each queue, log it only for debug purposes. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
It is a bit chatty to report on every deleted queue, so keep it for debug purposes only. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We already do that when we are notified in device removal which is triggered when unregistering as an ib client. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 07 1月, 2018 9 次提交
-
-
由 Bart Van Assche 提交于
Use the sgl_alloc_order() and sgl_free() functions instead of open coding these functions. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Acked-by: NNicholas A. Bellinger <nab@linux-iscsi.org> Reviewed-by: NHannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Use the sgl_alloc() and sgl_free() functions instead of open coding these functions. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Use the sgl_alloc() and sgl_free() functions instead of open coding these functions. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJames Smart <james.smart@broadcom.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Arnd Bergmann 提交于
When CONFIG_KASAN is set, all the local variables in this function are allocated on the stack together, leading to a warning about possible kernel stack overflow: drivers/block/DAC960.c: In function 'DAC960_gam_ioctl': drivers/block/DAC960.c:7061:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] By splitting up the function into smaller chunks, we can avoid that and make the code slightly more readable at the same time. The coding style in this file is completely nonstandard, and I chose to not touch that at all, leaving the unconventional intendation unchanged to make it easier to review the diff. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
The bio is always freed after running crypt_free_buffer_pages(), so it isn't necessary to clear bv->bv_page. Cc: Mike Snitzer <snitzer@redhat.com> Cc:dm-devel@redhat.com Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
bcache is the only user of bio_alloc_pages(), so move this function into bcache, and avoid it being misused in the future. Also rename it to bch_bio_allo_pages() since it is bcache only. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
All direct access to bvec table are safe even after multipage bvec is supported. Cc: linux-bcache@vger.kernel.org Acked-by: NColy Li <colyli@suse.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
For BIO based DM, some targets aren't ready for dealing with bigger incoming bio than 1Mbyte, such as crypt target. Cc: Mike Snitzer <snitzer@redhat.com> Cc:dm-devel@redhat.com Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 1月, 2018 2 次提交
-
-
由 Bart Van Assche 提交于
Call bdev_get_queue(bdev) after bdev->bd_disk has been initialized instead of just before that pointer has been initialized. This patch avoids that the following command pktsetup 1 /dev/sr0 triggers the following kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000548 IP: pkt_setup_dev+0x2db/0x670 [pktcdvd] CPU: 2 PID: 724 Comm: pktsetup Not tainted 4.15.0-rc4-dbg+ #1 Call Trace: pkt_ctl_ioctl+0xce/0x1c0 [pktcdvd] do_vfs_ioctl+0x8e/0x670 SyS_ioctl+0x3c/0x70 entry_SYSCALL_64_fastpath+0x23/0x9a Reported-by: NMaciej S. Szmigiero <mail@maciej.szmigiero.name> Fixes: commit ca18d6f7 ("block: Make most scsi_req_init() calls implicit") Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Tested-by: NMaciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Commit 523e1d39 ("block: make gendisk hold a reference to its queue") modified add_disk() and disk_release() but did not update any of the error paths that trigger a put_disk() call after disk->queue has been assigned. That introduced the following behavior in the pktcdvd driver if pkt_new_dev() fails: Kernel BUG at 00000000e98fd882 [verbose debug info unavailable] Since disk_release() calls blk_put_queue() anyway if disk->queue != NULL, fix this by removing the blk_cleanup_queue() call from the pkt_setup_dev() error path. Fixes: commit 523e1d39 ("block: make gendisk hold a reference to its queue") Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: <stable@vger.kernel.org> # v3.2 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 1月, 2018 6 次提交
-
-
由 Matias Bjørling 提交于
Shorten function to simply return the value of the if statement. Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Since pblk registers its own block device, the iostat accounting is not automatically done for us. Therefore, add the necessary accounting logic to satisfy the iostat interface. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Add the instance name to the information printed out on target creation. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Refactor the way we free the write buffer to ensure that all entries get freed in case of an error on the init sequence. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
When creating the write thread, ensure that the kthread has been created before initializing the timer responsible from kicking it. Otherwise, if the kthread creation fails or gets killed from used space, we risk kicking an empty thread structure. Also, since the kthread creation can be interrupted form user space, adapt the error path to not report an error when this happens, since it is intentional that the instance creation is aborted. Signed-off-by: NJavier González <javier@cnexlabs.com> Updated source to reflect the new timer_setup API. Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
On scan recovery, reads can fail. This happens because the first page for each line is read in order to determined if the line has been used (and thus needs to be recovered), or not. This can lead to "empty page" read errors. Since these errors are normal, do not log them, as they are confusing when reviewing the logs. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-