- 13 10月, 2017 37 次提交
-
-
由 Javier González 提交于
A partial read I/O in pblk is an I/O where some sectors reside in the write buffer in main memory and some are persisted on the device. Such an I/O must at least contain 2 lbas, therefore checking for the case where a single lba is mapped is not necessary. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
When a line is recycled during garbage collection, reads can still be issued to the line. If the line is freed in the middle of this process, data corruption might occur. This patch guarantees that lines are not freed in the middle of reads that target them (lines). Specifically, we use the existing line reference to decide when a line is eligible for being freed after the recycle process. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
As part of pblk's recovery scheme, we store the lba mapped to each physical sector on the device's out-of-bound (OOB) area. On the read path, we can use this information to validate that the data being delivered to the upper layers corresponds to the lba being requested. The cost of this check is an extra copy on the DMA region on the device and an extra comparison in the host, given that (i) the OOB area is being read together with the data in the media, and (ii) the DMA region allocated for the ppa list can be reused for the metadata stored on the OOB area. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
For consistency with the rest of pblk, use rqd->end_io to point to the function taking care of ending the request on the completion path. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Refactor the rqd allocation and free functions so that all I/O types can use these helper functions. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Each request type sent to the LightNVM subsystem requires different metadata. Until now, we have tailored this metadata based on write, read and erase commands. However, pblk uses different metadata for internal writes that do not hit the write buffer. Instead of abusing the metadata for reads, create a new request type - internal write to improve code readability. In the process, create internal values for each I/O type instead of abusing the READ/WRITE macros, as suggested by Christoph. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Wait until we know the exact number of ppas to be sent to the device, before allocating the bio. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
On REQ_PREFLUSH, directly tag the I/O context flags to signal a flush in the write to cache path, instead of finding the correct entry context and imposing a memory barrier. This simplifies the code and might potentially prevent race conditions when adding functionality to the write path. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Simplify put bio by doing it on bio end_io instead of manually putting it on the completion path. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Simplify the part of the garbage collector where data is read from the line being recycled and moved into an internal queue before being copied to the memory buffer. This allows to get rid of a dedicated function, which introduces an unnecessary dependency on the code. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
When a line is selected for recycling by the garbage collector (GC), the line state changes and the invalid bitmap is frozen, preventing invalidations from happening. Throughout the GC, the L2P map is checked to verify that not data being recycled has been updated. The last check is done before the new map is being stored on the L2P table. Though this algorithm works, it requires a number of corner cases to be checked each time the L2P table is being updated. This complicates readability and is error prone in case that the recycling algorithm is modified. Instead, this patch makes the invalid bitmap accessible even when the line is being recycled. When recycled data is being remapped, it is enough to check the invalid bitmap for the line before updating the L2P table. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Refactor lba sanity check on read path to avoid code duplication. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Normalize the way we name ppa variables to improve code readability. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Use a constant to set the maximum number of inflight GC requests allowed. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
As part of the mempool audit on pblk, remove unnecessary mempool allocation checks on mempools. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
pblk holds two sector bitmaps: one to keep track of the mapped sectors while the line is active and another one to keep track of the invalid sectors. The latter is kept during the whole live of the line, until it is recycled. Since we cannot guarantee forward progress for the mempool in this case, get rid of the mempool and simply allocate memory through kmalloc. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Since read and erase paths offer different guarantees for inflight I/Os, separate the mempools to set the right min_nr for each on creation. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
In pblk, we have a mempool to allocate a generic structure that we pass along workqueues. This is heavily used in the GC path in order to have enough inflight reads and fully utilize the GC bandwidth. However, the current GC path copies data to the host memory and puts it back into the write buffer. This requires a vmalloc allocation for the data and a memory copy. Thus, guaranteeing the allocation by using a mempool for the structure in itself does not give us much. Until we implement support for vector copy to avoid moving data through the host, just allocate the workqueue structure using kmalloc. This allows us to have a much smaller mempool. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
pblk uses an internal page mempool for allocating pages on internal bios. The main two users of this memory pool are partial reads (reads with some sectors in cache and some on media) and padded writes, which need to add dummy pages to an existing bio already containing valid data (and with a large enough bioset allocated). In both cases, the maximum number of pages per bio is defined by the maximum number of physical sectors supported by the underlying device. This patch fixes a bad mempool allocation, where the min_nr of elements on the pool was fixed (to 16), which is lower than the maximum number of sectors supported by NVMe (as of the time for this patch). Instead, use the maximum number of allowed sectors reported by the device. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
On low LUN configurations, make sure not to send bios that are bigger than the buffer size. Fixes: a4bd217b ("lightnvm: physical block device (pblk) target") Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Fix stat counter to collect the right number of I/Os being synced on the completion path. Fixes: 0880a9aa ("lightnvm: pblk: delete redundant buffer pointer") Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
When a REQ_FLUSH reaches pblk, the bio cannot be directly completed. Instead, data on the write buffer is flushed and the bio is completed on the completion pah. This might require some sectors to be padded in order to guarantee a successful write. This patch fixes a memory leak on the padded pages. A consequence of this bad free was that internal bios not containing data (only a flush) were not being completed. Fixes: a4bd217b ("lightnvm: physical block device (pblk) target") Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
The data buffer for the GC path allocates virtual memory through vmalloc. When this change was introduced, a flag signaling kmalloc'ed memory was wrongly introduced. Use the right flag when creating a bio from this buffer. Fixes: de54e703 ("lightnvm: pblk: use vmalloc for GC data buffer") Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Javier González 提交于
Initialize the stat counter for garbage collected reads. Fixes: a4bd217b ("lightnvm: physical block device (pblk) target") Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
This is a trivial change which reuses pblk_gc_should_kick instead of repeating it again in pblk_rl_free_lines_inc. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Made it apply to the common case. Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
Correct it by converting little endian to cpu endian and also define a macro for line version so that maintenance is easy. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
The two pr_err messages are useless as they don't differentiate error code. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
This usually happens if we are developing with qemu and ll2pmode has default value. Improve description. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
It seems pblk_dealloc_page would race against pblk_alloc_pages for line bitmap for sector allocation.The chances are very low but might as well protect the bitmap properly. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
Because NVM needs BLK_DEV_NVME, select it automatically if we mark NVM in config file before building kernel. Also append PCI to depends as select doesn't automatically add dependencies. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
Use appropriate memory free calls based on allocation type used and also fix number of times free is called if kmalloc fails. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
Remove repeated calculation for number of channels while creating a target device. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
nvm_tgt_types list was protected by wrong lock for NVM_INFO ioctl call and can race with addition or removal of target types. Also unregistering target type was not protected correctly. Fixes: 5cd90785 ("lightnvm: remove nested lock conflict with mm") Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
When a virtual block device is formatted and mounted after creating with "nvme lnvm create... -t pblk", a removal from "nvm lnvm remove" would result in this: 446416.309757] bdi-block not registered [446416.309773] ------------[ cut here ]------------ [446416.309780] WARNING: CPU: 3 PID: 4319 at fs/fs-writeback.c:2159 __mark_inode_dirty+0x268/0x340 Ideally removal should return -EBUSY as block device is mounted after formatting. This patch tries to address this checking if whole device or any partition of it already mounted or not before removal. Whole device is checked using "bd_super" member of block device. This member is always set once block device has been mounted using a filesystem. Another member "bd_part_count" takes care of checking any if any partitions are under use. "bd_part_count" is only updated under locks when partitions are opened or closed (first open and last release). This at least does take care sending -EBUSY if removal is being attempted while whole block device or any partition is mounted. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
If target type module e.g. pblk here is unloaded (rmmod) while module is in use (after creating target) system crashes. We fix this by using module API refcnt. Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christos Gkekas 提交于
Remove variables that are set but never used. Signed-off-by: NChristos Gkekas <chris.gekas@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Rakesh Pandit 提交于
Replaced by pr_err usage in commit ef510424 ("block, dax: move "select DAX" from BLOCK to FS_DAX") Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Acked-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 10月, 2017 3 次提交
-
-
由 Shaohua Li 提交于
Legacy queue sets request's request_list, mq doesn't. This makes mq does the same thing, so we can find cgroup of a request. Note, we really only use blkg field of request_list, it's pointless to allocate mempool for request_list in mq case. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
Fix two issues: - the per-cpu stat flush is unnecessary, nobody uses per-cpu stat except sum it to global stat. We can do the calculation there. The flush just wastes cpu time. - some fields are signed int/s64. I don't see the point. Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jiufei Xue 提交于
A null pointer dereference can occur when blkcg is removed manually with writeback IOs inflight. This is caused by the following case: Writeback kworker submit the bio and set bio->bi_cg_private to tg in blk_throtl_assoc_bio. Then we remove the block cgroup manually, the blkg and tg would be freed if there is no request inflight. When the submitted bio come back, blk_throtl_bio_endio() fetch the tg which was already freed. Fix this by increasing the refcount of blkg in funcion blk_throtl_assoc_bio() so that the blkg will not be freed until the bio_endio called. Reviewed-by: NShaohua Li <shli@fb.com> Signed-off-by: NJiufei Xue <jiufei.xjf@alibaba-inc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-