- 11 2月, 2015 1 次提交
-
-
由 Roger Pau Monne 提交于
Current migration code uses blk_put_request in order to finish a request before requeuing it. This function doesn't update the statistics of the queue, which completely screws accounting. Use blk_end_request_all instead which properly updates the statistics of the queue. Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com> Reported-and-Tested-by: NOuyang Zhaowei (Charles) <ouyangzhaowei@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org
-
- 03 2月, 2015 1 次提交
-
-
由 Takashi Iwai 提交于
Use the static attribute groups assigned to the device instead of calling device_create_file() after the device registration. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 30 1月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
Currently we allocate an nvme_iod for each IO, which holds the sg list, prps, and other IO related info. Set a threshold of 2 pages and/or 8KB of data, below which we can just embed this in the per-command pdu in blk-mq. For any IO at or below NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree. For higher IOPS, this saves up to 1% of CPU time. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NKeith Busch <keith.busch@intel.com>
-
- 24 1月, 2015 1 次提交
-
-
由 Shaohua Li 提交于
The libata tag allocation is using a round-robin policy. Next patch will make libata use block generic tag allocation, so let's add a policy to tag allocation. Currently two policies: FIFO (default) and round-robin. Cc: Jens Axboe <axboe@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 1月, 2015 2 次提交
-
-
由 kaoudis 提交于
Converting from to blk-queue got rid of the driver's RCU locking-on-queue, so removing unnecessary RCU locking-on-queue artefacts. Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NKelly Nicole Kaoudis <kaoudis@colorado.edu> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
blkdev_issue_discard() will zero a given block range. This is done by way of explicit writing, thus provisioning or allocating the blocks on disk. There are use cases where the desired behavior is to zero the blocks but unprovision them if possible. The blocks must deterministically contain zeroes when they are subsequently read back. This patch adds a flag to blkdev_issue_zeroout() that provides this variant. If the discard flag is set and a block device guarantees discard_zeroes_data we will use REQ_DISCARD to clear the block range. If the device does not support discard_zeroes_data or if the discard request fails we will fall back to first REQ_WRITE_SAME and then a regular REQ_WRITE. Also update the callers of blkdev_issue_zero() to reflect the new flag and make sb_issue_zeroout() prefer the discard approach. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 1月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
null_blk is partitionable, but it doesn't store any of the info. When it is loaded, you would normally see: [1226739.343608] nullb0: unknown partition table [1226739.343746] nullb1: unknown partition table which can confuse some people. Add the appropriate gendisk flag to suppress this info. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 1月, 2015 3 次提交
-
-
由 Boaz Harrosh 提交于
Because of the direct_access() API which returns a PFN. partitions better start on 4K boundary, else offset ZERO of a partition will not be aligned and blk_direct_access() will fail the call. By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate this to fdisk and friends. The call to blk_queue_physical_block_size() is harmless and will not affect the Kernel behavior in any way. It is only for communication to user-mode. before this patch running fdisk on a default size brd of 4M the first sector offered is 34 (BAD), but after this patch it will be 40, ie 8 sectors aligned. Also when entering some random partition sizes the next partition-start sector is offered 8 sectors aligned after this patch. (Please note that with fdisk the user can still enter bad values, only the offered default values will be correct) Note that with bdev-size > 4M fdisk will try to align on a 1M boundary (above first-sector will be 2048), in any case. CC: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NBoaz Harrosh <boaz@plexistor.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Boaz Harrosh 提交于
This patch fixes up brd's partitions scheme, now enjoying all worlds. The MAIN fix here is that currently, if one fdisks some partitions, a BAD bug will make all partitions point to the same start-end sector ie: 0 - brd_size And an mkfs of any partition would trash the partition table and the other partition. Another fix is that "mount -U uuid" will not work if show_part was not specified, because of the GENHD_FL_SUPPRESS_PARTITION_INFO flag. We now always load without it and remove the show_part parameter. [We remove Dmitry's new module-param part_show it is now always show] So NOW the logic goes like this: * max_part - Just says how many minors to reserve between ramX devices. In any way, there can be as many partition as requested. If minors between devices ends, then dynamic 259-major ids will be allocated on the fly. The default is now max_part=1, which means all partitions devt(s) will be from the dynamic (259) major-range. (If persistent partition minors is needed use max_part=X) For example with /dev/sdX max_part is hard coded 16. * Creation of new devices on the fly still/always work: mknod /path/devnod b 1 X fdisk -l /path/devnod Will create a new device if [X / max_part] was not already created before. (Just as before) partitions on the dynamically created device will work as well Same logic applies with minors as with the pre-created ones. TODO: dynamic grow of device size. So each device can have it's own size. CC: Dmitry Monakhov <dmonakhov@openvz.org> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NBoaz Harrosh <boaz@plexistor.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matthew Wilcox 提交于
In order to support accesses to larger chunks of memory, pass in a 'size' parameter (counted in bytes), and return the amount available at that address. Add a new helper function, bdev_direct_access(), to handle common functionality including partition handling, checking the length requested is positive, checking for the sector being page-aligned, and checking the length of the request does not pass the end of the partition. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NBoaz Harrosh <boaz@plexistor.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 1月, 2015 6 次提交
-
-
由 Jens Axboe 提交于
Looks like we pull it in through other ways on x86, but we fail on sparc: In file included from drivers/block/cryptoloop.c:30:0: drivers/block/loop.h:63:24: error: field 'tag_set' has incomplete type struct blk_mq_tag_set tag_set; Add the include to loop.h, kill it from loop.c. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
block core handles REQ_FUA by its flush state machine, so won't do it in loop explicitly. Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
No behaviour change, just move the handling for REQ_DISCARD and REQ_FLUSH in these two functions. Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
Switch to block request completely. Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
The conversion is a bit straightforward, and use work queue to dispatch requests of loop block, and one big change is that requests is submitted to backend file/device concurrently with work queue, so throughput may get improved much. Given write requests over same file are often run exclusively, so don't handle them concurrently for avoiding extra context switch cost, possible lock contention and work schedule cost. Also with blk-mq, there is opportunity to get loop I/O merged before submitting to backend file/device. In the following test: - base: v3.19-rc2-2041231 - loop over file in ext4 file system on SSD disk - bs: 4k, libaio, io depth: 64, O_DIRECT, num of jobs: 1 - throughput: IOPS ------------------------------------------------------ | | base | base with loop-mq | delta | ------------------------------------------------------ | randread | 1740 | 25318 | +1355%| ------------------------------------------------------ | read | 42196 | 51771 | +22.6%| ----------------------------------------------------- | randwrite | 35709 | 34624 | -3% | ----------------------------------------------------- | write | 39137 | 40326 | +3% | ----------------------------------------------------- So loop-mq can improve throughput for both read and randread, meantime, performance of write and randwrite isn't hurted basically. Another benefit is that loop driver code gets simplified much after blk-mq conversion, and the patch can be thought as cleanup too. Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
Check IS_ERR_OR_NULL(return value) instead of just return value. Signed-off-by: NMing Lei <ming.lei@canonical.com> Reduced to IS_ERR() by me, we never return NULL. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 23 12月, 2014 1 次提交
-
-
由 Keith Busch 提交于
Sets the vector to an invalid value after it's freed so we don't free it twice. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 12月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
CEPH_OSD_OP_DELETE is not an extent op, stop treating it as such. This sneaked in with discard patches - it's one of the three osd ops (the other two are CEPH_OSD_OP_TRUNCATE and CEPH_OSD_OP_ZERO) that discard is implemented with. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 SF Markus Elfring 提交于
The functions ceph_put_snap_context() and iput() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> [idryomov@redhat.com: squashed rbd.c hunk, changelog] Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
-
- 14 12月, 2014 5 次提交
-
-
由 Ganesh Mahendran 提交于
In current zram, we use DEVICE_ATTR() to define sys device attributes. SO, we need to set (S_IRUGO | S_IWUSR) permission and other arguments manually. Linux already provids the macro DEVICE_ATTR_[RW|RO|WO] to define sys device attribute. It is simple and readable. This patch uses kernel defined macro DEVICE_ATTR_[RW|RO|WO] to define zram device attribute. Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mahendran Ganesh 提交于
In struct zram_table_entry, the element *value* contains obj size and obj zram flags. Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj zram_flags. So the first zram flag(ZRAM_ZERO) should be from ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1). This patch fixes this cosmetic issue. Also fix a typo, "page in now accessed" -> "page is now accessed" Signed-off-by: NMahendran Ganesh <opensource.ganesh@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NWeijie Yang <weijie.yang@samsung.com> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 karam.lee 提交于
This patch implements rw_page operation for zram block device. I implemented the feature in zram and tested it. Test bed was the G2, LG electronic mobile device, whtich has msm8974 processor and 2GB memory. With a memory allocation test program consuming memory, the system generates swap. Operating time of swap_write_page() was measured. -------------------------------------------------- | | operating time | improvement | | | (20 runs average) | | -------------------------------------------------- |with patch | 1061.15 us | +2.4% | -------------------------------------------------- |without patch| 1087.35 us | | -------------------------------------------------- Each test(with paged_io,with BIO) result set shows normal distribution and has equal variance. I mean the two values are valid result to compare. I can say operation with paged I/O(without BIO) is faster 2.4% with confidence level 95%. [minchan@kernel.org: make rw_page opeartion return 0] [minchan@kernel.org: rely on the bi_end_io for zram_rw_page fails] [sergey.senozhatsky@gmail.com: code cleanup] [minchan@kernel.org: add comment] Signed-off-by: Nkaram.lee <karam.lee@lge.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <seungho1.park@lge.com> Signed-off-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 karam.lee 提交于
This patch changes parameter of valid_io_request for common usage. The purpose of valid_io_request() is to determine if bio request is valid or not. This patch use I/O start address and size instead of a BIO parameter for common usage. Signed-off-by: Nkaram.lee <karam.lee@lge.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <seungho1.park@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 karam.lee 提交于
Recently rw_page block device operation has been added. This patchset implements rw_page operation for zram block device and does some clean-up. This patch (of 3): Remove an unnecessary parameter(bio) from zram_bvec_rw() and zram_bvec_read(). zram_bvec_read() doesn't use a bio parameter, so remove it. zram_bvec_rw() calls a read/write operation not using bio, so a rw parameter replaces a bio parameter. Signed-off-by: Nkaram.lee <karam.lee@lge.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <seungho1.park@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2014 5 次提交
-
-
由 Jens Axboe 提交于
If we have a race between the schedule timing out and the command completing, we could have the task issuing the command exit nvme_submit_sync_cmd() while the irq is running sync_completion(). If that happens, we could be corrupting memory, since the stack that held 'cmdinfo' is no longer valid. Fix this by always calling nvme_abort_cmd_info(). Once that call completes, we know that we have either run sync_completion() if the completion came in, or that we will never run it since we now have special_completion() as the command callback handler. Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dwight Engen 提交于
This change enables the sunvdc driver to reconnect and recover if a vds service domain is disconnected or bounced. By default, it will wait indefinitely for the service domain to become available again, but will honor a non-zero vdc-timout md property if one is set. If a timeout is reached, any in-progress I/O's are completed with -EIO. Signed-off-by: NDwight Engen <dwight.engen@oracle.com> Reviewed-by: NChris Hyser <chris.hyser@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dwight Engen 提交于
Both sunvdc and sunvnet implemented distinct functionality for incrementing and decrementing dring indexes. Create common functions for use by both from the sunvnet versions, which were chosen since they will still work correctly in case a non power of two ring size is used. Signed-off-by: NDwight Engen <dwight.engen@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dwight Engen 提交于
Free resources allocated during port/disk probing so that the module may be successfully reloaded after unloading. Signed-off-by: NDwight Engen <dwight.engen@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jens Axboe 提交于
The logic around retrying and erroring IO in nvme_queue_rq() is broken in a few ways: - If we fail allocating dma memory for a discard, we return retry. We have the 'iod' stored in ->special, but we free the 'iod'. - For a normal request, if we fail dma mapping of setting up prps, we have the same iod situation. Additionally, we haven't set the callback for the request yet, so we also potentially leak IOMMU resources. Get rid of the ->special 'iod' store. The retry is uncommon enough that it's not worth optimizing for or holding on to resources to attempt to speed it up. Additionally, it's usually best practice to free any request related resources when doing retries. Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 12月, 2014 5 次提交
-
-
由 Indraneel M 提交于
After Hot-remove of a device with a mounted partition, when the device is hot-added again, the new node reappears as nvme0n1. Mounting this new node fails with the error: mount: mount /dev/nvme0n1p1 on /mnt failed: File exists. The old nodes's FS entries still exist and the kernel can't re-create procfs and sysfs entries for the new node with the same name. The patch fixes this issue. Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NIndraneel M <indraneel.m@samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
We return an error pointer or the request, not NULL. Half the call paths got it right, the others didn't. Fix those up. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sam Bradshaw 提交于
We allocate 'abort_req', but free 'req' in case of an error submitting the IO. Signed-off-by: NSam Bradshaw <sbradshaw@micron.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Vitaly Kuznetsov 提交于
flush_op is unambiguously defined by feature_flush: REQ_FUA | REQ_FLUSH -> BLKIF_OP_WRITE_BARRIER REQ_FLUSH -> BLKIF_OP_FLUSH_DISKCACHE 0 -> 0 and thus can be removed. This is just a cleanup. The patch was suggested by Boris Ostrovsky. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Vitaly Kuznetsov 提交于
Guard against issuing unsupported REQ_FUA and REQ_FLUSH was introduced in d11e6158 and was factored out into blkif_request_flush_valid() in 0f1ca65e. However: 1) This check in incomplete. In case we negotiated to feature_flush = REQ_FLUSH and flush_op = BLKIF_OP_FLUSH_DISKCACHE (so FUA is unsupported) FUA request will still pass the check. 2) blkif_request_flush_valid() is misnamed. It is bool but returns true when the request is invalid. 3) When blkif_request_flush_valid() fails -EIO is being returned. It seems that -EOPNOTSUPP is more appropriate here. Fix all of the above issues. This patch is based on the original patch by Laszlo Ersek and a comment by Jeff Moyer. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLaszlo Ersek <lersek@redhat.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 09 12月, 2014 4 次提交
-
-
由 Michael S. Tsirkin 提交于
Core activates this bit automatically now, drop it from drivers that set it explicitly. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
If a device appears while module is being removed, driver will get a callback after we've given up on the major number. In theory this means this major number can get reused by something else, resulting in a conflict. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
-
由 Michael S. Tsirkin 提交于
It's never declared so no need to make it extern. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
-
由 Michael S. Tsirkin 提交于
Based on patch by Cornelia Huck. Note: for consistency, and to avoid sparse errors, convert all fields, even those no longer in use for virtio v1.0. Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 04 12月, 2014 1 次提交
-
-
由 Keith Busch 提交于
On retry, the req->special is pointing to an already setup IOD, but we still need to setup the command context and callback, otherwise you'll see false twice completed errors and leak requests. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 27 11月, 2014 1 次提交
-
-
由 Matias Bjorling 提交于
When either queue_mode or irq_mode parameter is set outside its boundaries, the driver will not complete requests. This stalls driver initialization when partitions are probed. Fix by setting out of bound values to the parameters default. Signed-off-by: NMatias Bjørling <m@bjorling.me> Updated by me to have the parse+check in just one function. Signed-off-by: NJens Axboe <axboe@fb.com>
-