- 01 3月, 2018 2 次提交
-
-
由 Bart Van Assche 提交于
This patch does not change any functionality. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Similarly to the support we have for testing/faking timeouts for null_blk, this adds support for triggering a requeue condition. Considering the issues around restart we've been seeing, this should be a useful addition to the testing arsenal to ensure that we are handling requeue conditions correctly. This works for queue mode 1 (legacy request_fn based path) and 2 (blk-mq path), as there's no good way to do requeue with a bio based driver. This is similar to the timeout path. For the blk-mq path, we alternate between passing back BLK_STS_RESOURCE and manually calling blk_mq_requeue_request() in the driver. The former will hit the core requeue path, while the latter exercises the IO scheduler requeue path. Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 1月, 2018 1 次提交
-
-
由 Ming Lei 提交于
This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: NJens Axboe <axboe@kernel.dk> Tested-by: NLaurence Oberman <loberman@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 1月, 2018 2 次提交
-
-
由 Arnd Bergmann 提交于
Selecting FAULT_INJECTION causes a Kconfig warning when CONFIG_DEBUG_KERNEL is not set: warning: (BLK_DEV_NULL_BLK && DRM_I915_SELFTEST) selects FAULT_INJECTION which has unmet direct dependencies (DEBUG_KERNEL) The other drivers that use FAULT_INJECTION tend to have a separate Kconfig symbol for turning on that feature, so let's do the same thing here. This may add a bit more complexity than we like, but it avoids the warning and is more consistent with the rest of the kernel. Fixes: 93b57046 ("null_blk: add option for managing IO timeouts") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Use the fault injection framework to provide a way for null_blk to configure timeouts. This only works for queue_mode 1 and 2, since the bio mode doesn't have code for tracking timeouts. Let's say you want to have a 10% chance of timing out every 100,000 requests, and for 5 total timeouts, you could do: modprobe null_blk timeout="100000,10,0,5" This is useful for adding blktests to test that IO timeouts are handled appropriately. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 1月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
This is needed to ensure that we actually handle timeouts. Without it, the queue_mode=1 path will never call blk_add_timer(), and the queue_mode=2 path will continually just return EH_RESET_TIMER and we never actually complete the offending request. This was used to test the new timeout code, and the changes around killing off REQ_ATOM_COMPLETE. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 1月, 2018 1 次提交
-
-
由 Matias Bjørling 提交于
With rrpc to be removed, the null_blk lightnvm support is no longer functional. Remove the lightnvm implementation and maybe add it to another module in the future if someone takes on the challenge. Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 12月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
Commit 966a9671 randomly added alignment to this structure, but it's actually detrimental to performance of null_blk. Test case: Running on both the home and remote node shows a ~5% degradation in performance. While in there, move blk_status_t to the hole after the integer tag in the nullb_cmd structure. After this patch, we shrink the size from 192 to 152 bytes. Fixes: 966a9671 ("smp: Avoid using two cache lines for struct call_single_data") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 11月, 2017 1 次提交
-
-
由 David Disseldorp 提交于
null_alloc_dev() allocates memory for dev->badblocks, but cleanup currently only occurs in the configfs release codepath, missing a number of other places. This bug was found running the blktests block/010 test, alongside kmemleak: rapido1:/blktests# ./check block/010 ... rapido1:/blktests# echo scan > /sys/kernel/debug/kmemleak [ 306.966708] kmemleak: 32 new suspected memory leaks (see /sys/kernel/debug/kmemleak) rapido1:/blktests# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88001f86d000 (size 4096): comm "modprobe", pid 231, jiffies 4294892415 (age 318.252s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814b0379>] kmemleak_alloc+0x49/0xa0 [<ffffffff810f180f>] kmem_cache_alloc+0x9f/0xe0 [<ffffffff8124e45f>] badblocks_init+0x2f/0x60 [<ffffffffa0019fae>] 0xffffffffa0019fae [<ffffffffa0021273>] nullb_device_badblocks_store+0x63/0x130 [null_blk] [<ffffffff810004cd>] do_one_initcall+0x3d/0x170 [<ffffffff8109fe0d>] do_init_module+0x56/0x1e9 [<ffffffff8109ebd7>] load_module+0x1c47/0x26a0 [<ffffffff8109f819>] SyS_finit_module+0xa9/0xd0 [<ffffffff814b4f60>] entry_SYSCALL_64_fastpath+0x13/0x94 Fixes: 2f54a613 ("nullb: badbblocks support") Reviewed-by: NShaohua Li <shli@fb.com> Signed-off-by: NDavid Disseldorp <ddiss@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 10月, 2017 1 次提交
-
-
由 Bhumika Goyal 提交于
Make these structures const as they are either passed to the functions having the argument as const or stored as a reference in the "ci_type" const field of a config_item structure. Done using Coccienlle. Signed-off-by: NBhumika Goyal <bhumirks@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 17 10月, 2017 1 次提交
-
-
由 Wei Yongjun 提交于
Fix to return error code -ENOMEM from the null_alloc_dev() error handling case instead of 0, as done elsewhere in this function. Fixes: 2984c868 ("nullb: factor disk parameters") Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 9月, 2017 1 次提交
-
-
由 weiping zhang 提交于
add an option that disable io scheduler for null block device. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 8月, 2017 3 次提交
-
-
由 Ying Huang 提交于
struct call_single_data is used in IPIs to transfer information between CPUs. Its size is bigger than sizeof(unsigned long) and less than cache line size. Currently it is not allocated with any explicit alignment requirements. This makes it possible for allocated call_single_data to cross two cache lines, which results in double the number of the cache lines that need to be transferred among CPUs. This can be fixed by requiring call_single_data to be aligned with the size of call_single_data. Currently the size of call_single_data is the power of 2. If we add new fields to call_single_data, we may need to add padding to make sure the size of new definition is the power of 2 as well. Fortunately, this is enforced by GCC, which will report bad sizes. To set alignment requirements of call_single_data to the size of call_single_data, a struct definition and a typedef is used. To test the effect of the patch, I used the vm-scalability multiple thread swap test case (swap-w-seq-mt). The test will create multiple threads and each thread will eat memory until all RAM and part of swap is used, so that huge number of IPIs are triggered when unmapping memory. In the test, the throughput of memory writing improves ~5% compared with misaligned call_single_data, because of faster IPIs. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NHuang, Ying <ying.huang@intel.com> [ Add call_single_data_t and align with size of call_single_data. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jens Axboe 提交于
We already have this pointer, no need to use to_nullb_device() again. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
Commit 2984c868(nullb: factor disk parameters) has a typo. The nullb_device allocation/free is done outside of null_add_dev. The commit accidentally frees the nullb_device in error code path. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 8月, 2017 2 次提交
-
-
由 Shaohua Li 提交于
Dan reported this: The patch 2984c868: "nullb: factor disk parameters" from Aug 14, 2017, leads to the following Smatch complaint: drivers/block/null_blk.c:1759 null_init_tag_set() error: we previously assumed 'nullb' could be null (see line 1750) 1755 set->cmd_size = sizeof(struct nullb_cmd); 1756 set->flags = BLK_MQ_F_SHOULD_MERGE; 1757 set->driver_data = NULL; 1758 1759 if (nullb->dev->blocking) ^^^^^^^^^^^^^^^^^^^^ And an unchecked dereference. nullb could be NULL here. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Update to a working one, the fusionio address hasn't been valid in 4 years. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 8月, 2017 9 次提交
-
-
由 Shaohua Li 提交于
Sometime disk could have tracks broken and data there is inaccessable, but data in other parts can be accessed in normal way. MD RAID supports such disks. But we don't have a good way to test it, because we can't control which part of a physical disk is bad. For a virtual disk, this can be easily controlled. This patch adds a new 'badblock' attribute. Configure it in this way: echo "+1-100" > xxx/badblock, this will make sector [1-100] as bad blocks. echo "-20-30" > xxx/badblock, this will make sector [20-30] good If badblocks are accessed, the nullb disk will return IO error. Other parts of the disk can accessed in normal way. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
Software must flush disk cache to guarantee data safety. To check if software correctly does disk cache flush, we must know the behavior of disk. But physical disk behavior is uncontrollable. Even software doesn't do the flush, the disk probably does the flush. This patch tries to emulate a cache in the test disk. All write will go to a cache first, when the cache is full, we then flush some data to disk storage. A flush request will flush all data of the cache to disk storage. A FUA write will write to memory store directly and revalidate data in cache. If there is a power failure (by writing to power attribute, 'echo 0 > disk_name/power'), we discard all data in the cache, but preserve the data in disk storage. Later we can power on the disk again as usual (write 1 to 'power' attribute), then we can check data integrity and very if software does everything correctly. A new attribute 'cache_size' (in MB) is added to configure cache size. Based on original patch from Kyungchan Koh Signed-off-by: NKyungchan Koh <kkc6196@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
In test, we usually expect controllable disk speed. For example, in a raid array, we'd like some disks are fast and some are slow. MD RAID actually has a feature for this. To test the feature, we'd like to make the disk run in specific speed. block throttling probably can be used for this purpose, but it requires cgroup setup. Here we just implement a simple throttling mechanism in the driver. There is slight fluctuation in the mechanism, but it's good enough for test. To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is MB/s. Based on original patch from Kyungchan Koh Signed-off-by: NKyungchan Koh <kkc6196@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
discard makes sense for memory backed disk. And also it's useful to test if upper layer supports dicard correctly. User configures 'discard' attribute to enable/disable dicard support. Based on original patch from Kyungchan Koh Signed-off-by: NKyungchan Koh <kkc6196@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
This adds memory backed store in nullb. User configure 'memory_backed' attribute for this. By default, nullb disk doesn't use memory backed store. Based on original patch from Kyungchan Koh Signed-off-by: NKyungchan Koh <kkc6196@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
We now dynamically create disks. Managing the disk index with ida to avoid bump up the index too much. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
The device created in nullb configfs interface isn't power on by default. After user configures the device, user can do 'echo 1 > xxx/nullb/device_name/power' to power on the device, which will create a disk. the xxx/nullb/device_name/index is the disk index, so if the index is 2, the new created disk should be named as /dev/nullb2. Note, the 'index' is only valid after disk is power on. 'echo 0 > xxx/nullb/device_name/power' will remove the disk. Note, this doesn't remove the device. To remove the device, user should do 'rmdir xxx/nullb/device_name'. Removing the device will remove the disk too. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
Add configfs interface for nullb. configfs interface is more flexible and easy to configure in a per-disk basis. Configuration is something like this: mount -t configfs none /mnt Checking which features the driver supports: cat /mnt/nullb/features The 'features' attribute is for future extension. We probably will add new features into the driver, userspace can check this attribute to find the supported features. Create/remove a device: mkdir/rmdir /mnt/nullb/a Then configure the device by setting attributes under /mnt/nullb/a, most of nullb supported module parameters are converted to attributes: size; /* device size in MB */ completion_nsec; /* time in ns to complete a request */ submit_queues; /* number of submission queues */ home_node; /* home node for the device */ queue_mode; /* block interface */ blocksize; /* block size */ irqmode; /* IRQ completion handler */ hw_queue_depth; /* queue depth */ use_lightnvm; /* register as a LightNVM device */ blocking; /* blocking blk-mq device */ use_per_node_hctx; /* use per-node allocation for hardware context */ Note, creating a device doesn't create a disk immediately. Creating a disk is done in two phases: create a device and then power on the device. Next patch will introduce device power on. Based on original patch from Kyungchan Koh Signed-off-by: NKyungchan Koh <kkc6196@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shaohua Li 提交于
When we switch to configfs interface, each disk could have different configuration. To prepare for the change, we move most disk setting to a separate data structure. The existing module parameter interface is kept. The 'nr_devices' and 'shared_tags' don't make sense for per-disk setting, so they are remained as global settings. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 8月, 2017 2 次提交
-
-
由 weiping zhang 提交于
set submit_queues to 1 by default, and make sure it's value > 0. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 weiping zhang 提交于
make sure submit_queues equal nr_online_nodes. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 7月, 2017 1 次提交
-
-
由 Max Gurtovoy 提交于
In case we use shared tags feature, blk_mq_alloc_tag_set might fail during module initialization. In that case, fail the load with a suitable error code. Also move the tagset initialization process after defining the amount of submission queues. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 6月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
Some storage drivers need to share tag sets between devices. It's useful to be able to model that with null_blk, to find hangs or performance issues. Add a 'shared_tags' bool module parameter that. If that is set to true and nr_devices is bigger than 1, all devices allocated will share the same tag set. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 6月, 2017 2 次提交
-
-
由 Christoph Hellwig 提交于
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 4月, 2017 2 次提交
-
-
由 Christoph Hellwig 提交于
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 4月, 2017 1 次提交
-
-
由 Bart Van Assche 提交于
This patch changes the behavior of the null_blk driver for the LightNVM mode as follows: * REQ_FAILFAST_MASK is set for read-ahead requests. * If no I/O priority has been set in the bio, the I/O priority is copied from the I/O context. * The rq_disk member is initialized if bio->bi_bdev != NULL. * req->errors is initialized to zero. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Matias Bjørling <m@bjorling.me> Cc: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 31 3月, 2017 2 次提交
-
-
由 Eric Biggers 提交于
Constify all instances of blk_mq_ops, as they are never modified. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
This adds a new module parameter to null_blk, blocking. If set, null_blk will set the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always needs to block in its ->queue_rq() function. The intent is to help find regressions in blocking drivers, since not many of them exist. If null_blk is loaded with submit_queues > 1 and blocking=1, this shows the regression recently fixed by bf4907c0. Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 2月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 31 1月, 2017 2 次提交
-
-
由 Matias Bjørling 提交于
When the lightnvm core had the "gennvm" layer between the device and the target, there was a need for the core to be able to figure out which target it should send an end_io callback to. Leading to a "double" end_io, first for the media manager instance, and then for the target instance. Now that core and gennvm is merged, there is no longer a need for this, and a single end_io callback will do. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
The number of configuration groups has been limited to one in current code, even if there is support for up to four. With the introduction of the open-channel SSD 1.3 specification, only a single group is exposed onwards. Reflect this in the nvm_id structure. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-