- 05 11月, 2014 7 次提交
-
-
由 Keith Busch 提交于
Fixing tabs inadvertently converted to spaces. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Race conditions are theoretically possible between the NVMe PCI device removal and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make the NVMe code use pci_stop_and_remove_bus_device_locked(). Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This is a minor refactor for handling devices that are incapable of IO. The driver previously used special error codes to know that IO queues are unavailable, but we have an online queue count now. This also fixes an issue where the driver successfully sets the queue count, but either is unable to allocate an IO queue or the device can't create one for some reason. If the driver can successfully enable the device and get responses to admin commands, the driver will bring up a character device for managment but not create block devices. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan McLeran 提交于
Change the behavior of nvme_enable_ctrl to set EN. Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl. Remove reading of the CC register and manage the state in dev->ctrl_config. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> [removed an unwanted write to CC] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Submits NVMe asynchronous event requests, one event up to the controller maximum or number of possible different event types (8), whichever is smaller. Events successfully returned by the controller are logged. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Use the zeroing function instead of dma_alloc_coherent & memset(,0,) Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 10月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Since we have the notion of a 'last' request in a chain, we can use this to have the hardware optimize the issuing of requests. Add a list_head parameter to queue_rq that the driver can use to temporarily store hw commands for issue when 'last' is true. If we are doing a chain of requests, pass in a NULL list for the first request to force issue of that immediately, then batch the remainder for deferred issue until the last request has been sent. Instead of adding yet another argument to the hot ->queue_rq path, encapsulate the passed arguments in a blk_mq_queue_data structure. This is passed as a constant, and has been tested as faster than passing 4 (or even 3) args through ->queue_rq. Update drivers for the new ->queue_rq() prototype. There are no functional changes in this patch for drivers - if they don't use the passed in list, then they will just queue requests individually like before. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 10月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Set max_sectors to the value the drivers provides as hardware limit by default. Linux had proper I/O throttling for a long time and doesn't rely on a artifically small maximum I/O size anymore. By not limiting the I/O size by default we remove an annoying tuning step required for most Linux installation. Note that both the user, and if absolutely required the driver can still impose a limit for FS requests below max_hw_sectors_kb. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 10月, 2014 18 次提交
-
-
由 Michael S. Tsirkin 提交于
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio block violated this rule on restore by restarting queues, which might in theory cause the VQ to be used directly within restore. To fix, call virtio_device_ready before using starting queues. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Michael S. Tsirkin 提交于
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio block violated this rule by calling add_disk, which causes the VQ to be used directly within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Michael S. Tsirkin 提交于
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Michael S. Tsirkin 提交于
Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio blk. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Ilya Dryomov 提交于
Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups under memory pressure - we sit on the memory reclaim path. Cc: stable@vger.kernel.org # 3.17, needs backporting for 3.16 Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Tested-by: NMicha Krause <micha@krausam.de> Reviewed-by: NSage Weil <sage@redhat.com>
-
由 Josh Durgin 提交于
max_discard_sectors must be set for the queue to support discard. Operations implementing discard for rbd zero data, so report that. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Josh Durgin 提交于
Only allocate two osd ops for discard requests, since the preallocation hint is only added for regular writes. Use rbd_img_obj_request_fill() to recreate the original write or discard osd operations, isolating that logic to one place, and change the assert in rbd_osd_req_create_copyup() to accept discard requests as well. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Josh Durgin 提交于
rbd_img_request_fill() creates a ceph_osd_request and has logic for adding the appropriate osd ops to it based on the request type and image properties. For layered images, the original rbd_obj_request is resent with a copyup operation in front, using a new ceph_osd_request. The logic for adding the original operations should be the same as when first sending them, so move it to a helper function. op_type only needs to be checked once, so create a helper for that as well and call it outside the loop in rbd_img_request_fill(). Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Josh Durgin 提交于
Discard requests are a form of write, so they should go through the same process as plain write requests and trigger copy-on-write for layered images. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Josh Durgin 提交于
Discard may try to delete an object from a non-layered image that does not exist. If this occurs, the image already has no data in that range, so change the result to success. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Josh Durgin 提交于
Discards take a reference to the snapshot context of an image when they are created. This reference needs to be cleaned up when the request is done just as it is for regular writes. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Josh Durgin 提交于
In rbd_img_request_fill() the image size is only checked to determine whether we can truncate an object instead of zeroing it for discard requests. Take rbd_dev->header_rwsem while reading the image size, and move this read into the discard check, so that non-discard ops don't need to take the semaphore in this function. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Guangliang Zhao 提交于
This patch add the discard support for rbd driver. There are three types operation in the driver: 1. The objects would be removed if they completely contained within the discard range. 2. The objects would be truncated if they partly contained within the discard range, and align with their boundary. 3. Others would be zeroed. A discard request from blkdev_issue_discard() is defined which REQ_WRITE and REQ_DISCARD both marked and no data, so we must check the REQ_DISCARD first when getting the request type. This resolve: http://tracker.ceph.com/issues/190 [ Ilya Dryomov: This is incomplete and somewhat buggy, see follow up commits by Josh Durgin for refinements and fixes which weren't folded in to preserve authorship. ] Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Guangliang Zhao 提交于
It could only handle the read and write operations now, extend it for the coming discard support. Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Guangliang Zhao 提交于
It need to copyup the parent's content when layered writing, but an entire object write would overwrite it, so skip it. Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
To clarify the conditions and make it easier to add new ones. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
-
由 Josh Durgin 提交于
These fields may both change while the image is mapped if a snapshot is created or deleted or the image is resized. They are guarded by rbd_dev->header_rwsem, so hold that while reading them, and store a local copy to refer to outside of the critical section. The local copy will stay consistent since the snapshot context is reference counted, and the mapping size is just a u64. This prevents torn loads from giving us inconsistent values. Move reading header.snapc into the caller of rbd_img_request_create() so that we only need to take the semaphore once. The read-only caller, rbd_parent_request_create() can just pass NULL for snapc, since the snapshot context is only relevant for writes. Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Ilya Dryomov 提交于
Trying to map an image out of a pool for which we don't have an 'x' permission bit fails with -ERANGE from ceph_extract_encoded_string() due to an unsigned vs signed bug. Fix it and get rid of the -EINVAL sink, thus propagating rbd::get_id cls method errors. (I've seen a bunch of unexplained -ERANGE reports, I bet this is it). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 10 10月, 2014 4 次提交
-
-
由 Sergey Senozhatsky 提交于
`notify_free' device attribute accounts the number of slot free notifications and internally represents the number of zram_free_page() calls. Slot free notifications are sent only when device is used as a swap device, hence `notify_free' is used only for swap devices. Since f4659d8e (zram: support REQ_DISCARD) ZRAM handles yet another one free notification (also via zram_free_page() call) -- REQ_DISCARD requests, which are sent by a filesystem, whenever some data blocks are discarded. However, there is no way to know the number of notifications in the latter case. Use `notify_free' to account the number of pages freed by zram_bio_discard() and zram_slot_free_notify(). Depending on usage scenario `notify_free' represents: a) the number of pages freed because of slot free notifications, which is equal to the number of swap_slot_free_notify() calls, so there is no behaviour change b) the number of pages freed because of REQ_DISCARD notifications Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Chao Yu <chao2.yu@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Normally, zram user could get maximum memory usage zram consumed via polling mem_used_total with sysfs in userspace. But it has a critical problem because user can miss peak memory usage during update inverval of polling. For avoiding that, user should poll it with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault delay when memory pressure is heavy. It would be troublesome. This patch adds new knob "mem_used_max" so user could see the maximum memory usage easily via reading the knob and reset it via "echo 0 > /sys/block/zram0/mem_used_max". Signed-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NDan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Reviewed-by: NDavid Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Since zram has no control feature to limit memory usage, it makes hard to manage system memrory. This patch adds new knob "mem_limit" via sysfs to set up the a limit so that zram could fail allocation once it reaches the limit. In addition, user could change the limit in runtime so that he could manage the memory more dynamically. Initial state is no limit so it doesn't break old behavior. [akpm@linux-foundation.org: fix typo, per Sergey] Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with *byte unit* but zsmalloc operates *page unit* rather than byte unit so let's change the API so benefit we could get is that reduce unnecessary overhead (ie, change page unit with byte unit) in zsmalloc. Since return type is pages, "zs_get_total_pages" is better than "zs_get_total_size_bytes". Signed-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NDan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 10月, 2014 1 次提交
-
-
由 Al Viro 提交于
check with the author of that horror... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 06 10月, 2014 1 次提交
-
-
由 David Vrabel 提交于
The DEFINE_XENBUS_DRIVER() macro looks a bit weird and causes sparse errors. Replace the uses with standard structure definitions instead. This is similar to pci and usb device registration. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 05 10月, 2014 1 次提交
-
-
由 Mike Snitzer 提交于
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set QUEUE_FLAG_NONROT. Historically, all block devices have automatically made entropy contributions. But as previously stated in commit e2e1a148 ("block: add sysfs knob for turning off disk entropy contributions"): - On SSD disks, the completion times aren't as random as they are for rotational drives. So it's questionable whether they should contribute to the random pool in the first place. - Calling add_disk_randomness() has a lot of overhead. There are more reliable sources for randomness than non-rotational block devices. From a security perspective it is better to err on the side of caution than to allow entropy contributions from unreliable "random" sources. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 10月, 2014 4 次提交
-
-
由 Arianna Avanzini 提交于
This commit factors out some checks related to the request insertion path, which can be done in an function instead of by itself. Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Roger Pau Monné 提交于
Fix leaking a page when a grant mapping has failed. CC: stable@vger.kernel.org Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com> Reported-and-Tested-by: NTao Chen <boby.chen@huawei.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Vitaly Kuznetsov 提交于
blkback does not unmap persistent grants when frontend goes to Closed state (e.g. when blkfront module is being removed). This leads to the following in guest's dmesg: [ 343.243825] xen:grant_table: WARNING: g.e. 0x445 still in use! [ 343.243825] xen:grant_table: WARNING: g.e. 0x42a still in use! ... When load module -> use device -> unload module sequence is performed multiple times it is possible to hit BUG() condition in blkfront module: [ 343.243825] kernel BUG at drivers/block/xen-blkfront.c:954! [ 343.243825] invalid opcode: 0000 [#1] SMP [ 343.243825] Modules linked in: xen_blkfront(-) ata_generic pata_acpi [last unloaded: xen_blkfront] ... [ 343.243825] Call Trace: [ 343.243825] [<ffffffff814111ef>] ? unregister_xenbus_watch+0x16f/0x1e0 [ 343.243825] [<ffffffffa0016fbf>] blkfront_remove+0x3f/0x140 [xen_blkfront] ... [ 343.243825] RIP [<ffffffffa0016aae>] blkif_free+0x34e/0x360 [xen_blkfront] [ 343.243825] RSP <ffff88001eb8fdc0> We don't need to keep these grants if we're disconnecting as frontend might already forgot about them. Solve the issue by moving xen_blkbk_free_caches() call from xen_blkif_free() to xen_blkif_disconnect(). Now we can see the following: [ 928.590893] xen:grant_table: WARNING: g.e. 0x587 still in use! [ 928.591861] xen:grant_table: WARNING: g.e. 0x372 still in use! ... [ 929.592146] xen:grant_table: freeing g.e. 0x587 [ 929.597174] xen:grant_table: freeing g.e. 0x372 ... Backend does not keep persistent grants any more, reconnect works fine. CC: stable@vger.kernel.org Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Michael Opdenacker 提交于
This removes the use of the IRQF_DISABLED flag from drivers/block/rsxx/core.c It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com> Acked-by Philip Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 10月, 2014 2 次提交
-
-
由 Michael Opdenacker 提交于
This patch removes the use of the IRQF_DISABLED flag from drivers/block/hd.c It's a NOOP since 2.6.35 and it will be removed one day. This also removes a related comment which is obsolete too. Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dwight Engen 提交于
vio_dring_avail() will allow use of every dring entry, but when the last entry is allocated then dr->prod == dr->cons which is indistinguishable from the ring empty condition. This causes the next allocation to reuse an entry. When this happens in sunvdc, the server side vds driver begins nack'ing the messages and ends up resetting the ldc channel. This problem does not effect sunvnet since it checks for < 2. The fix here is to just never allocate the very last dring slot so that full and empty are not the same condition. The request start path was changed to check for the ring being full a bit earlier, and to stop the blk_queue if there is no space left. The blk_queue will be restarted once the ring is only half full again. The number of ring entries was increased to 512 which matches the sunvnet and Solaris vdc drivers, and greatly reduces the frequency of hitting the ring full condition and the associated blk_queue stop/starting. The checks in sunvent were adjusted to account for vio_dring_avail() returning 1 less. Orabug: 19441666 OraBZ: 14983 Signed-off-by: NDwight Engen <dwight.engen@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-