- 15 2月, 2016 1 次提交
-
-
由 Markus Pargmann 提交于
The userspace needs to know when nbd devices are ready for use. Currently no events are created for the userspace which doesn't work for systemd. See the discussion here: https://github.com/systemd/systemd/pull/358 This patch uses a central point to setup the nbd-internal sizes. A ioctl to set a size does not lead to a visible size change. The size of the block device will be kept at 0 until nbd is connected. As soon as it connects, the size will be changed to the real value and a uevent is created. When disconnecting, the blockdevice is set to 0 size and another uevent is generated. Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
-
- 05 2月, 2016 5 次提交
-
-
由 Dan Streetman 提交于
Make the "Attempted send on closed socket" error messages generated in nbd_request_handler() ratelimited. When the nbd socket is shutdown, the nbd_request_handler() function emits an error message for every request remaining in its queue. If the queue is large, this will spam a large amount of messages to the log. There's no need for a separate error message for each request, so this patch ratelimits it. In the specific case this was found, the system was virtual and the error messages were logged to the serial port, which overwhelmed it. Fixes: 4d48a542 ("nbd: fix I/O hang on disconnected nbds") Signed-off-by: NDan Streetman <dan.streetman@canonical.com> Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
-
由 Markus Pargmann 提交于
nbd changes properties of the blockdevice depending on flags that were received. This patch moves this flag parsing into a separate function nbd_parse_flags(). Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
-
由 Markus Pargmann 提交于
Group all variables that are reset after a disconnect into reset functions. This patch adds two of these functions, nbd_reset() and nbd_bdev_reset(). Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
-
由 Markus Pargmann 提交于
It may be useful to know in the client that a connection timed out. The current code returns success for a timeout. This patch reports the error code -ETIMEDOUT for a timeout. Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
-
由 Markus Pargmann 提交于
As discussed on the mailing list, the usage of signals for timeout handling has a lot of potential issues. The nbd driver used for some time signals for timeouts. These signals where able to get the threads out of the blocking socket operations. This patch removes all signal usage and uses a socket shutdown instead. The socket descriptor itself is cleared later when the whole nbd device is closed. The tasks_lock is removed as we do not depend on this anymore. Instead a new lock for the socket is introduced so we can safely work with the socket in the timeout handler outside of the two main threads. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 03 2月, 2016 1 次提交
-
-
由 Markus Pargmann 提交于
Static checker complains about the implemented error handling. It is indeed wrong. We don't care about the return values of created debugfs files. We only have to check the return values of created dirs for NULL pointer. If we use a null pointer as parent directory for files, this may lead to debugfs files in wrong places. Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
-
- 23 1月, 2016 2 次提交
-
-
由 Tetsuo Handa 提交于
There are many locations that do if (memory_was_allocated_by_vmalloc) vfree(ptr); else kfree(ptr); but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory using is_vmalloc_addr(). Unless callers have special reasons, we can replace this branch with kvfree(). Please check and reply if you found problems. Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJan Kara <jack@suse.com> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com> Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Boris Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 1月, 2016 1 次提交
-
-
由 Markus Elfring 提交于
The rbd_dev_destroy() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 16 1月, 2016 6 次提交
-
-
由 Dan Williams 提交于
For the purpose of communicating the optional presence of a 'struct page' for the pfn returned from ->direct_access(), introduce a type that encapsulates a page-frame-number plus flags. These flags contain the historical "page_link" encoding for a scatterlist entry, but can also denote "device memory". Where "device memory" is a set of pfns that are not part of the kernel's linear mapping by default, but are accessed via the same memory controller as ram. The motivation for this new type is large capacity persistent memory that needs struct page entries in the 'memmap' to support 3rd party DMA (i.e. O_DIRECT I/O with a persistent memory source/target). However, we also need it in support of maintaining a list of mapped inodes which need to be unmapped at driver teardown or freeze_bdev() time. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave@sr71.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jerome Marchand 提交于
The use of idr_remove() is forbidden in the callback functions of idr_for_each(). It is therefore unsafe to call idr_remove in zram_remove(). This patch moves the call to idr_remove() from zram_remove() to hot_remove_store(). In the detroy_devices() path, idrs are removed by idr_destroy(). This solves an use-after-free detected by KASan. [akpm@linux-foundation.org: fix coding stype, per Sergey] Signed-off-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
Do not __GFP_ZERO allocated zcomp ->private pages. We keep allocated streams around and use them for read/write requests, so we supply a zeroed out ->private to compression algorithm as a scratch buffer only once -- the first time we use that stream. For the rest of IO requests served by this stream ->private usually contains some temporarily data from the previous requests. Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Each zcomp backend uses own gfp flag but it's pointless because the context they could be called is driven by upper layer(ie, zcomp frontend). As well, zcomp frondend could call them in different context. One context(ie, zram init part) is it should be better to make sure successful allocation other context(ie, further stream allocation part for accelarating I/O speed) is just optional so let's pass gfp down from driver (ie, zcomp frontend) like normal MM convention. [sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps] Signed-off-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kyeongdon Kim 提交于
When we're using LZ4 multi compression streams for zram swap, we found out page allocation failure message in system running test. That was not only once, but a few(2 - 5 times per test). Also, some failure cases were continually occurring to try allocation order 3. In order to make parallel compression private data, we should call kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order 2/3 size memory to allocate in that time, page allocation fails. This patch makes to use vmalloc() as fallback of kmalloc(), this prevents page alloc failure warning. After using this, we never found warning message in running test, also It could reduce process startup latency about 60-120ms in each case. For reference a call trace : Binder_1: page allocation failure: order:3, mode:0x10c0d0 CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20 Call trace: dump_backtrace+0x0/0x270 show_stack+0x10/0x1c dump_stack+0x1c/0x28 warn_alloc_failed+0xfc/0x11c __alloc_pages_nodemask+0x724/0x7f0 __get_free_pages+0x14/0x5c kmalloc_order_trace+0x38/0xd8 zcomp_lz4_create+0x2c/0x38 zcomp_strm_alloc+0x34/0x78 zcomp_strm_multi_find+0x124/0x1ec zcomp_strm_find+0xc/0x18 zram_bvec_rw+0x2fc/0x780 zram_make_request+0x25c/0x2d4 generic_make_request+0x80/0xbc submit_bio+0xa4/0x15c __swap_writepage+0x218/0x230 swap_writepage+0x3c/0x4c shrink_page_list+0x51c/0x8d0 shrink_inactive_list+0x3f8/0x60c shrink_lruvec+0x33c/0x4cc shrink_zone+0x3c/0x100 try_to_free_pages+0x2b8/0x54c __alloc_pages_nodemask+0x514/0x7f0 __get_free_pages+0x14/0x5c proc_info_read+0x50/0xe4 vfs_read+0xa0/0x12c SyS_read+0x44/0x74 DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB [minchan@kernel.org: change vmalloc gfp and adding comment about gfp] [sergey.senozhatsky@gmail.com: tweak comments and styles] Signed-off-by: NKyeongdon Kim <kyeongdon.kim@lge.com> Signed-off-by: NMinchan Kim <minchan@kernel.org> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
We can end up allocating a new compression stream with GFP_KERNEL from within the IO path, which may result is nested (recursive) IO operations. That can introduce problems if the IO path in question is a reclaimer, holding some locks that will deadlock nested IOs. Allocate streams and working memory using GFP_NOIO flag, forbidding recursive IO and FS operations. An example: inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes: (jbd2_handle){+.+.?.}, at: start_this_handle+0x4ca/0x555 {IN-RECLAIM_FS-W} state was registered at: __lock_acquire+0x8da/0x117b lock_acquire+0x10c/0x1a7 start_this_handle+0x52d/0x555 jbd2__journal_start+0xb4/0x237 __ext4_journal_start_sb+0x108/0x17e ext4_dirty_inode+0x32/0x61 __mark_inode_dirty+0x16b/0x60c iput+0x11e/0x274 __dentry_kill+0x148/0x1b8 shrink_dentry_list+0x274/0x44a prune_dcache_sb+0x4a/0x55 super_cache_scan+0xfc/0x176 shrink_slab.part.14.constprop.25+0x2a2/0x4d3 shrink_zone+0x74/0x140 kswapd+0x6b7/0x930 kthread+0x107/0x10f ret_from_fork+0x3f/0x70 irq event stamp: 138297 hardirqs last enabled at (138297): debug_check_no_locks_freed+0x113/0x12f hardirqs last disabled at (138296): debug_check_no_locks_freed+0x33/0x12f softirqs last enabled at (137818): __do_softirq+0x2d3/0x3e9 softirqs last disabled at (137813): irq_exit+0x41/0x95 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(jbd2_handle); <Interrupt> lock(jbd2_handle); *** DEADLOCK *** 5 locks held by git/20158: #0: (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b #1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3 #2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b #3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b #4: (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555 stack backtrace: CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211 Call Trace: dump_stack+0x4c/0x6e mark_lock+0x384/0x56d mark_held_locks+0x5f/0x76 lockdep_trace_alloc+0xb2/0xb5 kmem_cache_alloc_trace+0x32/0x1e2 zcomp_strm_alloc+0x25/0x73 [zram] zcomp_strm_multi_find+0xe7/0x173 [zram] zcomp_strm_find+0xc/0xe [zram] zram_bvec_rw+0x2ca/0x7e0 [zram] zram_make_request+0x1fa/0x301 [zram] generic_make_request+0x9c/0xdb submit_bio+0xf7/0x120 ext4_io_submit+0x2e/0x43 ext4_bio_write_page+0x1b7/0x300 mpage_submit_page+0x60/0x77 mpage_map_and_submit_buffers+0x10f/0x21d ext4_writepages+0xc8c/0xe1b do_writepages+0x23/0x2c __filemap_fdatawrite_range+0x84/0x8b filemap_flush+0x1c/0x1e ext4_alloc_da_blocks+0xb8/0x117 ext4_rename+0x132/0x6dc ? mark_held_locks+0x5f/0x76 ext4_rename2+0x29/0x2b vfs_rename+0x540/0x636 SyS_renameat2+0x359/0x44d SyS_rename+0x1e/0x20 entry_SYSCALL_64_fastpath+0x12/0x6f [minchan@kernel.org: add stable mark] Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Cc: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2016 1 次提交
-
-
由 Arnd Bergmann 提交于
Dividing a sector_t number should be done using sector_div rather than do_div to optimize the 32-bit sector_t case, and with the latest do_div optimizations, we now get a compile-time warning for this: arch/arm/include/asm/div64.h:32:95: note: expected 'uint64_t * {aka long long unsigned int *}' but argument is of type 'sector_t * {aka long unsigned int *}' drivers/block/null_blk.c:521:81: warning: comparison of distinct pointer types lacks a cast This changes the newly added code to use sector_div. It is a simplified version of the original patch, as Linus Torvalds pointed out that we should not be using an expensive division function in the first place. This version was suggested by Matias Bjorling. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Matias Bjorling <m@bjorling.me> Fixes: b2b7e001 ("null_blk: register as a LightNVM device") Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 1月, 2016 1 次提交
-
-
由 Matias Bjørling 提交于
To implement sync I/O support within the LightNVM core, the end_io functions are refactored to take an end_io function pointer instead of testing for initialized media manager, followed by calling its end_io function. Sync I/O can then be implemented using a callback that signal I/O completion. This is similar to the logic found in blk_to_execute_io(). By implementing it this way, the underlying device I/Os submission logic is abstracted away from core, targets, and media managers. Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 1月, 2016 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Zhu Yanjun 提交于
The modified variables are only used in the file mtip32xx.c. As such, the static keyword is inserted to define that object to be only visible to the current code module during compilation. Signed-off-by: NZhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 1月, 2016 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
[folded a fix by Dan Carpenter] Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 1月, 2016 18 次提交
-
-
由 Colin Ian King 提交于
The max outstanding commands is being printed with a 0x prefix to suggest it is a hex value, when in fact the integer decimal %d format specifier is being used and this is a bit confusing. Use %x instead to match the proceeding 0x prefix. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
We have split the setting up of all the resources in two steps: 1) talk_to_blkback - which figures out the num_ring_pages (from the default value of zero), sets up shadow and so 2) blkfront_connect - does the real part of filling out the internal structures. The problem is if we bypass the 1) step and go straight to 2) and call blkfront_setup_indirect where we use the macro BLK_RING_SIZE - which returns an negative value (because sz is zero - since num_ring_pages is zero - since it has never been set). We can fix this by making sure that we always have called talk_to_blkback before going to blkfront_connect. Or we could set in blkfront_probe info->nr_ring_pages = 1 to have a default value. But that looks odd - as we haven't actually negotiated any ring size. This patch changes XenbusStateConnected state to detect if we haven't done the initial handshake - and if so continue on as if were in XenbusStateInitWait state. We also roll the error recovery (freeing the structure) into talk_to_blkback error path - which is safe since that function is only called from blkback_changed. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
This patch fixs two memleaks: backtrace: [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50 [<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0 [<ffffffff81534028>] xen_blkbk_probe+0x58/0x230 [<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130 [<ffffffff81511716>] driver_probe_device+0x166/0x2c0 [<ffffffff815119bc>] __device_attach_driver+0xac/0xb0 [<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90 [<ffffffff81511ab7>] __device_attach+0xc7/0x120 [<ffffffff81511b23>] device_initial_probe+0x13/0x20 [<ffffffff8151059a>] bus_probe_device+0x9a/0xb0 [<ffffffff8150f0a1>] device_add+0x3b1/0x5c0 [<ffffffff8150f47e>] device_register+0x1e/0x30 [<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170 [<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0 [<ffffffff8146b1bb>] backend_changed+0x1b/0x20 [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160 unreferenced object 0xffff880007ba8ef8 (size 224): backtrace: [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50 [<ffffffff81205c73>] __kmalloc+0xd3/0x1e0 [<ffffffff81534d87>] frontend_changed+0x2c7/0x580 [<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0 [<ffffffff8146b2c0>] frontend_changed+0x10/0x20 [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160 [<ffffffff810d3e97>] kthread+0xd7/0xf0 [<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff8800048dcd38 (size 224): The first leak is caused by not put() the be->blkif reference which we had gotten in xen_blkif_alloc(), while the second is us not freeing blkif->rings in the right place. Signed-off-by: NBob Liu <bob.liu@oracle.com> Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
Make st_* statistics per ring and the VBD sysfs would iterate over all the rings. Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings are torn down, so it's safe. Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Aligned the variables on the same column.
-
由 Julien Grall 提交于
The minimal size of request in the block framework is always PAGE_SIZE. It means that when 64KB guest is support, the request will at least be 64KB. Although, if the backend doesn't support indirect descriptor (such as QDISK in QEMU), a ring request is only able to accommodate 11 segments of 4KB (i.e 44KB). The current frontend is assuming that an I/O request will always fit in a ring request. This is not true any more when using 64KB page granularity and will therefore crash during boot. On ARM64, the ABI is completely neutral to the page granularity used by the domU. The guest has the choice between different page granularity supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB). This can't be enforced by the hypervisor and therefore it's possible to run guests using different page granularity. So we can't mandate the block backend to support indirect descriptor when the frontend is using 64KB page granularity and have to fix it properly in the frontend. The solution exposed below is based on modifying directly the frontend guest rather than asking the block framework to support smaller size (i.e < PAGE_SIZE). This is because the change is the block framework are not trivial as everything seems to relying on a struct *page (see [1]). Although, it may be possible that someone succeed to do it in the future and we would therefore be able to use it. Given that a block request may not fit in a single ring request, a second request is introduced for the data that cannot fit in the first one. This means that the second ring request should never be used on Linux if the page size is smaller than 44KB. To achieve the support of the extra ring request, the block queue size is divided by two. Therefore, the ring will always contain enough space to accommodate 2 ring requests. While this will reduce the overall performance, it will make the implementation more contained. The way forward to get better performance is to implement in the backend either indirect descriptor or multiple grants ring. Note that the parameters blk_queue_max_* helpers haven't been updated. The block code will set the mimimum size supported and we may be able to support directly any change in the block framework that lower down the minimal size of a request. [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.htmlSigned-off-by: NJulien Grall <julien.grall@citrix.com> Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Julien Grall 提交于
The code to get a request is always the same. Therefore we can factorize it in a single function. Signed-off-by: NJulien Grall <julien.grall@citrix.com> Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Jiri Kosina 提交于
xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of every attempt to purge the LRU. This operation can't ever succeed though, as the kthread hasn't marked itself as freezable. Before (hopefully eventually) kthread freezing gets converted to fileystem freezing, we'd rather mark xen_blkif_schedule() freezable (as it can generate I/O during suspend). Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
With the multi-queue support we could fail at setting up some of the rings and fail the connection. That meant that all resources tied to rings[0..n-1] (where n is the ring that failed to be setup). Eventually the frontend will switch to the states and we will call xen_blkif_disconnect. However we do not want to be at the mercy of the frontend deciding when to change states. This allows us to do the cleanup right away and freeing resources. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Lets return sensible values instead of -1. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
Make pool of persistent grants and free pages per-queue/ring instead of per-device to get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Results: iops1: After patch "xen/blkfront: make persistent grants per-queue". iops2: After this patch. Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops1(k): 810 1230(~20%) 1024(~20%) 850(~20%) Iops2(k): 810 1410(~35%) 1354(~75%) 1440(~100%) With 4 queues after this commit we can get ~75% increase in IOPS, and performance won't drop if increasing queue numbers. Please find the respective chart in this link: https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
Backend advertises "multi-queue-max-queues" to front, also get the negotiated number from "multi-queue-num-queues" written by blkfront. Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Preparatory patch for multiple hardware queues (rings). The number of rings is unconditionally set to 1, larger number will be enabled in "xen/blkback: get the number of hardware queues/rings from blkfront". Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Align variables in the structures.
-
由 Bob Liu 提交于
Split per ring information to an new structure "xen_blkif_ring", so that one vbd device can be associated with one or more rings/hardware queues. Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we may have multi backend threads. This patch is a preparation for supporting multi hardware queues/rings. Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Align the variables in the structure.
-
由 Peng Fan 提交于
According to this piece code: " pr_info("Invalid max_ring_order (%d), will use default max: %d.\n", xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER); " if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER, need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER, but not 0. Signed-off-by: NPeng Fan <van.freenix@gmail.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
Make persistent grants per-queue/ring instead of per-device, so that we can drop the 'dev_lock' and get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops patched(k): 810 1230(~20%) 1024(~20%) 850(~20%) Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
We do the same exact operations a bit earlier in the function. Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bob Liu 提交于
The max number of hardware queues for xen/blkfront is set by parameter 'max_queues'(default 4), while it is also capped by the max value that the xen/blkback exposes through XenStore key 'multi-queue-max-queues'. The negotiated number is the smaller one and would be written back to xenstore as "multi-queue-num-queues", blkback needs to read this negotiated number. Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-