- 18 1月, 2017 12 次提交
-
-
由 Jens Axboe 提交于
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
This is basically identical to deadline-iosched, except it registers as a MQ capable scheduler. This is still a single queue design. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independently of device queue depth. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
This is in preparation for having two sets of tags available. For that we need a static index, and a dynamically assignable one. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
No functional change in this patch, just in preparation for having two types of tags available to the block layer for a single request. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Prep patch for adding an extra tag map for scheduler requests. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
This is in preparation for having another tag set available. Cleanup the parameters, and allow passing in of tags for blk_mq_put_tag(). Signed-off-by: NJens Axboe <axboe@fb.com> [hch: even more cleanups] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
It's only used in blk-mq, kill it from the main exported header and kill the symbol export as well. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
We want to use it outside of blk-core.c. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Prep patch for adding MQ ops as well, since doing anon unions with named initializers doesn't work on older compilers. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Alden Tondettar 提交于
If a GUID Partition Table claims to have more than 2**25 entries, the calculation of the partition table size in alloc_read_gpt_entries() will overflow a 32-bit integer and not enough space will be allocated for the table. Nothing seems to get written out of bounds, but later efi_partition() will read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing information to /proc/partitions and uevents. The problem exists on both 64-bit and 32-bit platforms. Fix the overflow and also print a meaningful debug message if the table size is too large. Signed-off-by: NAlden Tondettar <alden.tondettar@gmail.com> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 1月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
We never change it, make that clear. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
-
- 03 1月, 2017 2 次提交
-
-
由 Bart Van Assche 提交于
This patch does not change any functionality. Fixes: e34cbd30 ("blk-wbt: add general throttling mechanism") Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Fixes: e34cbd30 ("blk-wbt: add general throttling mechanism") Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2016 1 次提交
-
-
由 Al Viro 提交于
Both damn things interpret userland pointers embedded into the payload; worse, they are actually traversing those. Leaving aside the bad API design, this is very much _not_ safe to call with KERNEL_DS. Bail out early if that happens. Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 12月, 2016 1 次提交
-
-
由 Stefan Haberland 提交于
Partitions that are not aligned to the blocksize of a device may cause invalid I/O requests because the blocklayer cares only about alignment within the partition when building requests on partitions. device |--------4096--------|--------4096--------|--------4096--------| partition offset 512byte |-512-|--------4096--------|--------4096--------|--------4096--------| When reading/writing one 4k block of the partition this maps to reading/writing with an offset of 512 byte of the device leading to unaligned requests for the device which in turn may cause unexpected behavior of the device driver. For DASD devices we have to translate the block number into a cylinder, head, record format. The unaligned requests lead to wrong calculation and therefore to misdirected I/O. In a "good" case this leads to I/O errors because the underlying hardware detects the wrong addressing. In a worst case scenario this might destroy data on the device. To prevent partitions that are not aligned to the physical blocksize of a device check for the alignment in the blkpg_ioctl. Signed-off-by: NStefan Haberland <sth@linux.vnet.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 19 12月, 2016 1 次提交
-
-
The WRITE_SAME commands are not present in the blk_default_cmd_filter write_ok list, and thus are failed with -EPERM when the SG_IO ioctl() is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users). [ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ] The problem can be reproduced with the sg_write_same command # sg_write_same --num 1 --xferlen 512 /dev/sda # # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_same --num 1 --xferlen 512 /dev/sda' Write same: pass through os error: Operation not permitted # For comparison, the WRITE_VERIFY command does not observe this problem, since it is in that list: # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda' # So, this patch adds the WRITE_SAME commands to the list, in order for the SG_IO ioctl to finish successfully: # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_same --num 1 --xferlen 512 /dev/sda' # That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]), which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu). In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls, which are translated to write-same calls in the guest kernel, and then into SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest: [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current] [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00 [...] blk_update_request: I/O error, dev sda, sector 17096824 Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: NBrahadambal Srinivasan <latha@linux.vnet.ibm.com> Reported-by: NManjunatha H R <manjuhr1@in.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 12月, 2016 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
In blk_mq_map_swqueue, there is a memory optimization that frees the tags of a queue that has gone unmapped. Later, if that hctx is remapped after another topology change, the tags need to be reallocated. If this allocation fails, a simple WARN_ON triggers, but the block layer ends up with an active hctx without any corresponding set of tags. Then, any income IO to that hctx can trigger an Oops. I can reproduce it consistently by running IO, flipping CPUs on and off and eventually injecting a memory allocation failure in that path. In the fix below, if the system experiences a failed allocation of any hctx's tags, we remap all the ctxs of that queue to the hctx_0, which should always keep it's tags. There is a minor performance hit, since our mapping just got worse after the error path, but this is the simplest solution to handle this error path. The performance hit will disappear after another successful remap. I considered dropping the memory optimization all together, but it seemed a bad trade-off to handle this very specific error case. This should apply cleanly on top of Jens' for-next branch. The Oops is the one below: SP (3fff935ce4d0) is in userspace 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c000000fe99eb110] pc: c0000000005e868c: __sbitmap_queue_get+0x2c/0x180 lr: c000000000575328: __bt_get+0x48/0xd0 sp: c000000fe99eb390 msr: 900000010280b033 dar: 28 dsisr: 40000000 current = 0xc000000fe9966800 paca = 0xc000000007e80300 softe: 0 irq_happened: 0x01 pid = 11035, comm = aio-stress Linux version 4.8.0-rc6+ (root@bean) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #3 SMP Mon Oct 10 20:16:53 CDT 2016 1:mon> s [c000000fe99eb3d0] c000000000575328 __bt_get+0x48/0xd0 [c000000fe99eb400] c000000000575838 bt_get.isra.1+0x78/0x2d0 [c000000fe99eb480] c000000000575cb4 blk_mq_get_tag+0x44/0x100 [c000000fe99eb4b0] c00000000056f6f4 __blk_mq_alloc_request+0x44/0x220 [c000000fe99eb500] c000000000570050 blk_mq_map_request+0x100/0x1f0 [c000000fe99eb580] c000000000574650 blk_mq_make_request+0xf0/0x540 [c000000fe99eb640] c000000000561c44 generic_make_request+0x144/0x230 [c000000fe99eb690] c000000000561e00 submit_bio+0xd0/0x200 [c000000fe99eb740] c0000000003ef740 ext4_io_submit+0x90/0xb0 [c000000fe99eb770] c0000000003e95d8 ext4_writepages+0x588/0xdd0 [c000000fe99eb910] c00000000025a9f0 do_writepages+0x60/0xc0 [c000000fe99eb940] c000000000246c88 __filemap_fdatawrite_range+0xf8/0x180 [c000000fe99eb9e0] c000000000246f90 filemap_write_and_wait_range+0x70/0xf0 [c000000fe99eba20] c0000000003dd844 ext4_sync_file+0x214/0x540 [c000000fe99eba80] c000000000364718 vfs_fsync_range+0x78/0x130 [c000000fe99ebad0] c0000000003dd46c ext4_file_write_iter+0x35c/0x430 [c000000fe99ebb90] c00000000038c280 aio_run_iocb+0x3b0/0x450 [c000000fe99ebce0] c00000000038dc28 do_io_submit+0x368/0x730 [c000000fe99ebe30] c000000000009404 system_call+0x38/0xec Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Reviewed-by: NDouglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 12月, 2016 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
While stressing memory and IO at the same time we changed SMT settings, we were able to consistently trigger deadlocks in the mm system, which froze the entire machine. I think that under memory stress conditions, the large allocations performed by blk_mq_init_rq_map may trigger a reclaim, which stalls waiting on the block layer remmaping completion, thus deadlocking the system. The trace below was collected after the machine stalled, waiting for the hotplug event completion. The simplest fix for this is to make allocations in this path non-reclaimable, with GFP_NOIO. With this patch, We couldn't hit the issue anymore. This should apply on top of Jens's for-next branch cleanly. Changes since v1: - Use GFP_NOIO instead of GFP_NOWAIT. Call Trace: [c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable) [c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430 [c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0 [c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0 [c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30 [c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0 [c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0 [c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs] [c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs] [c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs] [c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210 [c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0 [c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0 [c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520 [c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250 [c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0 [c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380 [c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360 [c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0 [c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250 [c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100 [c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0 [c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0 [c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250 [c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120 [c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0 [c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150 [c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0 [c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120 [c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0 [c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0 [c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0 [c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250 [c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0 [c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270 [c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110 [c000000f0160be30] [c000000000009204] system_call+0x38/0xec Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 12月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
We ran into a funky issue, where someone doing 256K buffered reads saw 128K requests at the device level. Turns out it is read-ahead capping the request size, since we use 128K as the default setting. This doesn't make a lot of sense - if someone is issuing 256K reads, they should see 256K reads, regardless of the read-ahead setting, if the underlying device can support a 256K read in a single command. This patch introduces a bdi hint, io_pages. This is the soft max IO size for the lower level, I've hooked it up to the bdev settings here. Read-ahead is modified to issue the maximum of the user request size, and the read-ahead max size, but capped to the max request size on the device side. The latter is done to avoid reading ahead too much, if the application asks for a huge read. With this patch, the kernel behaves like the application expects. Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.comSigned-off-by: NJens Axboe <axboe@fb.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2016 5 次提交
-
-
由 Jens Axboe 提交于
Everytime we need to read ->nr_samples, we should have flushed the batch first. The non-mq read path also needs to flush the batch. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Currently we pass in to run the queue async, but don't flag the queue to be run. We don't need to run it async here, but we should run it. So fixup the parameters. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
由 Jens Axboe 提交于
Takes a list of requests, and dispatches it. Moves any residual requests to the dispatch list. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
由 Jens Axboe 提交于
We have a variant for all hardware queues, but not one for a single hardware queue. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
- 09 12月, 2016 2 次提交
-
-
由 Christoph Hellwig 提交于
Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Both of these are metadata only commands that are not issued by the writeback code and not directly relevant to the writeback bandwith. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
In theory we could map other things, but there's a reason that function is called "user_iov". Using anything else (like splice can do) just confuses it. Reported-and-tested-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 12月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
- 05 12月, 2016 1 次提交
-
-
由 Nicolai Stange 提交于
Since commit e73c23ff ("block: add async variant of blkdev_issue_zeroout") messages like the following show up: EXT4-fs (dm-1): Delayed block allocation failed for inode 2368848 at logical offset 0 with max blocks 1 with error 95 EXT4-fs (dm-1): This should not happen!! Data will be lost Due to the following fallthrough introduced with commit 2d253440 ("block: Define zoned block device operations"), generic_make_request_checks() would accept a REQ_OP_WRITE_SAME bio only if the block device supports "write same" *and* is a zoned one: switch (bio_op(bio)) { [...] case REQ_OP_WRITE_SAME: if (!bdev_write_same(bio->bi_bdev)) goto not_supported; case REQ_OP_ZONE_REPORT: case REQ_OP_ZONE_RESET: if (!bdev_is_zoned(bio->bi_bdev)) goto not_supported; break; [...] } Thus, although the bio setup as done by __blkdev_issue_write_same() from commit e73c23ff ("block: add async variant of blkdev_issue_zeroout") would succeed, its actual submission would not, resulting in the EOPNOTSUPP == 95. Fix this by removing the fallthrough which, due to the lack of an explicit comment, seems to be unintended anyway. Fixes: e73c23ff ("block: add async variant of blkdev_issue_zeroout") Fixes: 2d253440 ("block: Define zoned block device operations") Signed-off-by: NNicolai Stange <nicstange@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 12月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
Signed-off-by: NShaohua Li <shli@fb.com> Fixes: cf43e6be ("block: add scalable completion tracking of requests") Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 12月, 2016 4 次提交
-
-
由 Ritesh Harjani 提交于
Factor out common code for setting REQ_NOMERGE flag which is being used out at certain places and make it a helper instead, req_set_nomerge(). Signed-off-by: NRitesh Harjani <riteshh@codeaurora.org> Get rid of the inline. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Chaitanya Kulkarni 提交于
This adds a new block layer operation to zero out a range of LBAs. This allows to implement zeroing for devices that don't use either discard with a predictable zero pattern or WRITE SAME of zeroes. The prominent example of that is NVMe with the Write Zeroes command, but in the future, this should also help with improving the way zeroing discards work. For this operation, suitable entry is exported in sysfs which indicate the number of maximum bytes allowed in one write zeroes operation by the device. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Chaitanya Kulkarni 提交于
Similar to __blkdev_issue_discard this variant allows submitting the final bio asynchronously and chaining multiple ranges into a single completion. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Damien Le Moal 提交于
Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of a zoned block device. However, the first and last zones reported for a partition make sense only if the partition start sector and size are aligned on the device zone size. The same applies for zone reset. Resetting the first or the last zone of a partition straddling zones may impact neighboring partitions. Finally, if a partition start sector is not at the beginning of a sequential zone, it will be impossible to write to the first sectors of the partition on a host-managed device. Avoid all these problems and incoherencies by ignoring partitions that are not zone aligned. Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the correct disk zoning type (host-aware, host-managed or none) but bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone size is unknown). So test this as a way to ensure that a zoned block device is being handled as such. As a result, for a host-aware devices, unaligned zone partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the disk will be treated as a regular block device (as it should). If zoned block device support is enabled, only aligned partitions will be accepted. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 11月, 2016 1 次提交
-
-
由 Kent Overstreet 提交于
This is a helper that pins down a range from an iov_iter and adds it to a bio without requiring a separate memory allocation for the page array. It will be used for upcoming direct I/O implementations for block devices and iomap based file systems. Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [hch: ported to the iov_iter interface, renamed and added comments. All blame should be directed to me and all fame should go to Kent after this!] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com> (cherry picked from commit 9cd56d916aa481ce8f56d9c5302a6ed90c2e0b5f)
-
- 29 11月, 2016 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
After commit 287922eb ("block: defer timeouts to a workqueue"), deleting the timeout work after freezing the queue shouldn't be necessary, since the synchronization is already enforced by the acquisition of a q_usage_counter reference in blk_mq_timeout_work. Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-