- 27 11月, 2012 1 次提交
-
-
由 Roger Pau Monne 提交于
Move the code that frees persistent grants from the red-black tree to a function. This will make it easier for other consumers to move this to a common place. Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 04 11月, 2012 1 次提交
-
-
由 Roger Pau Monne 提交于
This patch contains fixes for persistent grants implementation v2: * handle == 0 is a valid handle, so initialize grants in blkback setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported by Konrad Rzeszutek Wilk. * new_map is a boolean, use "true" or "false" instead of 1 and 0. Reported by Konrad Rzeszutek Wilk. * blkfront announces the persistent-grants feature as feature-persistent-grants, use feature-persistent instead which is consistent with blkback and the public Xen headers. * Add a consistency check in blkfront to make sure we don't try to access segments that have not been set. Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com> [v1: The new_map int->bool had already been changed] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 30 10月, 2012 1 次提交
-
-
由 Roger Pau Monne 提交于
This patch implements persistent grants for the xen-blk{front,back} mechanism. The effect of this change is to reduce the number of unmap operations performed, since they cause a (costly) TLB shootdown. This allows the I/O performance to scale better when a large number of VMs are performing I/O. Previously, the blkfront driver was supplied a bvec[] from the request queue. This was granted to dom0; dom0 performed the I/O and wrote directly into the grant-mapped memory and unmapped it; blkfront then removed foreign access for that grant. The cost of unmapping scales badly with the number of CPUs in Dom0. An experiment showed that when Dom0 has 24 VCPUs, and guests are performing parallel I/O to a ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests (at which point 650,000 IOPS are being performed in total). If more than 5 guests are used, the performance declines. By 10 guests, only 400,000 IOPS are being performed. This patch improves performance by only unmapping when the connection between blkfront and back is broken. On startup blkfront notifies blkback that it is using persistent grants, and blkback will do the same. If blkback is not capable of persistent mapping, blkfront will still use the same grants, since it is compatible with the previous protocol, and simplifies the code complexity in blkfront. To perform a read, in persistent mode, blkfront uses a separate pool of pages that it maps to dom0. When a request comes in, blkfront transmutes the request so that blkback will write into one of these free pages. Blkback keeps note of which grefs it has already mapped. When a new ring request comes to blkback, it looks to see if it has already mapped that page. If so, it will not map it again. If the page hasn't been previously mapped, it is mapped now, and a record is kept of this mapping. Blkback proceeds as usual. When blkfront is notified that blkback has completed a request, it memcpy's from the shared memory, into the bvec supplied. A record that the {gref, page} tuple is mapped, and not inflight is kept. Writes are similar, except that the memcpy is peformed from the supplied bvecs, into the shared pages, before the request is put onto the ring. Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black tree. As the grefs are not known apriori, and provide no guarantees on their ordering, we have to perform a search through this tree to find the page, for every gref we receive. This operation takes O(log n) time in the worst case. In blkfront grants are stored using a single linked list. The maximum number of grants that blkback will persistenly map is currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to prevent a malicios guest from attempting a DoS, by supplying fresh grefs, causing the Dom0 kernel to map excessively. If a guest is using persistent grants and exceeds the maximum number of grants to map persistenly the newly passed grefs will be mapped and unmaped. Using this approach, we can have requests that mix persistent and non-persistent grants, and we need to handle them correctly. This allows us to set the maximum number of persistent grants to a lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although setting it will lead to unpredictable performance. In writing this patch, the question arrises as to if the additional cost of performing memcpys in the guest (to/from the pool of granted pages) outweigh the gains of not performing TLB shootdowns. The answer to that question is `no'. There appears to be very little, if any additional cost to the guest of using persistent grants. There is perhaps a small saving, from the reduced number of hypercalls performed in granting, and ending foreign access. Signed-off-by: NOliver Chick <oliver.chick@citrix.com> Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up the misuse of bool as int]
-
- 12 9月, 2012 1 次提交
-
-
由 Stefano Stabellini 提交于
If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it> Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 09 8月, 2012 1 次提交
-
-
由 Stefano Stabellini 提交于
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 24 3月, 2012 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The only reason for the distinction was for the special case of 'file' (which is assumed to be loopback device), was to reach inside the loopback device, find the underlaying file, and call fallocate on it. Fortunately "xen-blkback: convert hole punching to discard request on loop devices" removes that use-case and we now based the discard support based on blk_queue_discard(q) and extract all appropriate parameters from the 'struct request_queue'. CC: Li Dongyang <lidongyang@novell.com> Acked-by: NJan Beulich <JBeulich@suse.com> [v1: Dropping pointless initializer and keeping blank line] [v2: Remove the kfree as it is not used anymore] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 20 3月, 2012 2 次提交
-
-
由 Daniel De Graaf 提交于
Signed-off-by: NDaniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Daniel De Graaf 提交于
Signed-off-by: NDaniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 19 11月, 2011 4 次提交
-
-
由 Li Dongyang 提交于
As of dfaa2ef6, loop devices support discard request now. We could just issue a discard request, and the loop driver will punch the hole for us, so we don't need to touch the internals of loop device and punch the hole ourselves, Thanks. V0->V1: rebased on devel/for-jens-3.3 Signed-off-by: NLi Dongyang <lidongyang@novell.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
.. and move it to its own function that will deal with the discard operation. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Part of the blkdev_issue_discard(xx) operation is that it can also issue a secure discard operation that will permanantly remove the sectors in question. We advertise that we can support that via the 'discard-secure' attribute and on the request, if the 'secure' bit is set, we will attempt to pass in REQ_DISCARD | REQ_SECURE. CC: Li Dongyang <lidongyang@novell.com> [v1: Used 'flag' instead of 'secure:1' bit] [v2: Use 'reserved' uint8_t instead of adding a new value] [v3: Check for nseg when mapping instead of operation] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
In a union type structure to deal with the overlapping attributes in a easier manner. Suggested-by: NIan Campbell <Ian.Campbell@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 18 10月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
There are two windows of opportunity to cause a race when processing a barrier request. This patch fixes this. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 15 10月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The patch titled: "xen/blkback: Fix the inhibition to map pages when discarding sector ranges." had the right idea except that it used the wrong comparison operator. It had == instead of !=. This fixes the bug where all (except discard) operations would have been ignored. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 13 10月, 2011 5 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The 'operation' parameters are the ones provided to the bio layer while the req->operation are the ones passed in between the backend and frontend. We used the wrong 'operation' value to squash the call to map pages when processing the discard operation resulting in an hypercall that did nothing. Lets guard against going in the mapping function by checking for the proper operation type. CC: Li Dongyang <lidongyang@novell.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
We did not increment the amount of sectors written to disk b/c we tested for the == WRITE which is incorrect - as the operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch fixes it by doing a & WRITE check. CC: stable@kernel.org Reported-by: NAndy Burns <xen.lists@burns.me.uk> Suggested-by: NIan Campbell <Ian.Campbell@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
We emulate the barrier requests by draining the outstanding bio's and then sending the WRITE_FLUSH command. To drain the I/Os we use the refcnt that is used during disconnect to wait for all the I/Os before disconnecting from the frontend. We latch on its value and if it reaches either the threshold for disconnect or when there are no more outstanding I/Os, then we have drained all I/Os. Suggested-by: NChristopher Hellwig <hch@infradead.org> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Jan Beulich 提交于
This fixes the problem of three of those four memset()-s having improper size arguments passed: Sizeof a pointer-typed expression returns the size of the pointer, not that of the pointed to data. It also reverts using kmalloc() instead of kzalloc() for the allocation of the pending grant handles array, as that array gets fully initialized in a subsequent loop. Reported-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Li Dongyang 提交于
..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend and used as appropiately by the backend. We also advertise certain granulity parameters to the frontend so it can plug them in. If the backend is a realy device - we just end up using 'blkdev_issue_discard' while for loopback devices - we just punch a hole in the image file. Signed-off-by: NLi Dongyang <lidongyang@novell.com> [v1: Fixed up pr_debug and commit description] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 29 9月, 2011 1 次提交
-
-
由 Stefano Stabellini 提交于
If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 01 7月, 2011 2 次提交
-
-
由 Bastian Blank 提交于
Add xen-backend:vbd module alias to the xen-blkback module. This allows automatic loading of the module. Signed-off-by: NBastian Blank <waldi@debian.org> Acked-by: NIan Campbell <ian.campbell@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Daniel Stodden 提交于
Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad idea. It means that in-flight I/O is essentially blocking continued batches. This essentially kills throughput on frontends which unplug (or even just notify) early and rightfully assume addtional requests will be picked up on time, not synchronously. Signed-off-by: NDaniel Stodden <daniel.stodden@citrix.com> [v1: Rebased and fixed compile problems] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 01 6月, 2011 1 次提交
-
-
由 Dan Carpenter 提交于
blkbk->pending_pages can be NULL here so I added a check for it. Signed-off-by: NDan Carpenter <error27@gmail.com> [v1: Redid the loop a bit] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 18 5月, 2011 1 次提交
-
-
由 Jan Beulich 提交于
The sector number on empty barrier requests may (will?) be -1, which, given that it's being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk's size. Inspired by Konrad's "When writting barriers set the sector number to zero...". While at it also add overflow checking to the math in vbd_translate(). Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 13 5月, 2011 9 次提交
-
-
由 Laszlo Ersek 提交于
vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double xenbus_transaction_end() calls. The next down_read() in xenbus_transaction_start() (at eg. the next resize attempt) hangs. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317Acked-by: NJan Beulich <jbeulich@novell.com> Acked-by: NIan Campbell <ian.campbell@citrix.com> Signed-off-by: NLaszlo Ersek <lersek@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
And not depend on the driver being built with -DDEBUG flag. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
No need for that '_st' and xen_blkif is more apt. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
CHECK: multiple assignments should be avoided Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
It is not really used for anything. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
To make it easier to read. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
And also make them uniform and prefix the message with 'xen-blkback'. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 12 5月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Suggested-by: NIan Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 06 5月, 2011 3 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
They had the wrong data or were in the wrong spot. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
We do a check for the operations right before calling dispatch_rw_block_io. And then we do the same check in dispatch_rw_block_io. This patch squashes those checks into the 'dispatch_rw_block_io' function. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
We drop the support for 'feature-barrier' and add in the support for the 'feature-flush-cache' if the real backend storage supports flushing. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 28 4月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
This reverts commit 97961ef4 b/c we lose about 15% performance if we do the unplugging and the end of the reading the ring buffer.
-
- 27 4月, 2011 2 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
If one runs a simple fio request with random read/write with a 20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler. IOmeter | | | | 64K, randrw | NOOP | CFQ | deadline | randrwmix=80 | | | | --------------+-------+------+----------+ blkback |103/27 |32/10 | 102/27 | --------------+-------+------+----------+ QEMU qdisk |103/27 |102/27| 102/27 | The problem as explained by Vivek Goyal was: ".. that difference is that sync vs async requests. In the case of a kernel thread submitting IO, [..] all the WRITES might be being considered as async and will go in a different queue. If you mix those with some READS, they are always sync and will go in differnet queue. In presence of sync queue, CFQ will idle and choke up WRITES in an attempt to improve latencies of READs. In case of AIO [note: this is what QEMU qdisk is doing] , [..] it is direct IO and both READS and WRITES will be considered SYNC and will go in a single queue and no choking of WRITES will take place." The solution is quite simple, tack on REQ_SYNC (which is what the WRITE_ODIRECT macro points to) and the numbers go back up. Suggested-by: Vivek Goyal <vgoyal@redhat.com Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
We used to the plug/unplug on the submit_bio. But that means if within a stream of WRITE, WRITE, WRITE,...,WRITE we have one READ, it could stall the pipeline (as the 'submio_bio' could trigger the unplug_fnc to be called and stall/sync when doing the READ). Instead we want to move the unplugging when the whole (or as a much as possible) ring buffer has been processed. This also eliminates us doing plug/unplug for each request. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-