提交 · e6fc46498784e799d3eb95d83079180e413c4e7d · openeuler / Kernel

26 8月, 2017 1 次提交

xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect · dc52d783

由 Annie Li 提交于 8月 24, 2017

In xen_blkif_disconnect, before checking inflight I/O, following code
stops the blkback thread,
if (ring->xenblkd) {
	kthread_stop(ring->xenblkd);
	wake_up(&ring->shutdown_wq);
}
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and above code would not be called to stop thread of remaining
queue and processs them. When removing vbd device with lots of disk I/O
load, some queues with inflight I/O still have blkback thread running even
though the corresponding vbd device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0.
This patch allows thread of every queue has the chance to get stopped.
Otherwise, only thread of queue previous to(including) first busy one get
stopped, blkthread of remaining queue will still run.  So stop all threads
properly and return -EBUSY if any queue has inflight I/O.
Signed-off-by: NAnnie Li <annie.li@oracle.com>
Reviewed-by: NHerbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: NAdnan Misherfi <adnan.misherfi@oracle.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

dc52d783

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

18 8月, 2017 2 次提交

xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1 · 3f2c9405

由 Bart Van Assche 提交于 8月 17, 2017

Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f2c9405

xen-blkback: Fix indentation · 306b82a8

由 Bart Van Assche 提交于 8月 17, 2017

Avoid that smatch reports the following warning when building with
C=2 CHECK="smatch -p=kernel":

drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

306b82a8

14 6月, 2017 4 次提交

xen-blkback: don't leak stack data via response ring · 089bc014

由 Jan Beulich 提交于 6月 13, 2017

Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other backends do.
Build on the fact that all response structure flavors are actually
identical (the old code did make this assumption too).

This is XSA-216.

Cc: stable@vger.kernel.org
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

089bc014

xen/blkback: don't use xen_blkif_get() in xen-blkback kthread · a24fa22c

由 Juergen Gross 提交于 5月 18, 2017

There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
of xen-blkback. Thread stopping is synchronous and using the blkif
reference counting in the kthread will avoid to ever let the reference
count drop to zero at the end of an I/O running concurrent to
disconnecting and multiple rings.

Setting ring->xenblkd to NULL after stopping the kthread isn't needed
as the kthread does this already.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NSteven Haigh <netwiz@crc.id.au>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a24fa22c

xen/blkback: don't free be structure too early · 71df1d7c

由 Juergen Gross 提交于 5月 18, 2017

The be structure must not be freed when freeing the blkif structure
isn't done. Otherwise a use-after-free of be when unmapping the ring
used for communicating with the frontend will occur in case of a
late call of xenblk_disconnect() (e.g. due to an I/O still active
when trying to disconnect).
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NSteven Haigh <netwiz@crc.id.au>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

71df1d7c

xen/blkback: fix disconnect while I/Os in flight · 46464411

由 Juergen Gross 提交于 5月 18, 2017

Today disconnecting xen-blkback is broken in case there are still
I/Os in flight: xen_blkif_disconnect() will bail out early without
releasing all resources in the hope it will be called again when
the last request has terminated. This, however, won't happen as
xen_blkif_free() won't be called on termination of the last running
request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
calls in xen_blkif_disconnect() didn't happen.

To solve this deadlock xen_blkif_disconnect() and
xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
xen_blkif_get() but use some other way to do their accounting of
resources.

This at once fixes another error in xen_blkif_disconnect(): when it
returned early with -EBUSY for another ring than 0 it would call
xen_blkif_put() again for already handled rings on a subsequent call.
This will lead to inconsistencies in the refcnt handling.

Cc: stable@vger.kernel.org
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NSteven Haigh <netwiz@crc.id.au>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

46464411

09 6月, 2017 1 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

16 5月, 2017 1 次提交

block: xen-blkback: add null check to avoid null pointer dereference · 2d4456c7

由 Gustavo A. R. Silva 提交于 5月 11, 2017

Add null check before calling xen_blkif_put() to avoid potential
null pointer dereference.

Addresses-Coverity-ID: 1350942
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2d4456c7

10 2月, 2017 1 次提交

xen: modify xenstore watch event interface · 5584ea25

由 Juergen Gross 提交于 2月 09, 2017

Today a Xenstore watch event is delivered via a callback function
declared as:

void (*callback)(struct xenbus_watch *,
                 const char **vec, unsigned int len);

As all watch events only ever come with two parameters (path and token)
changing the prototype to:

void (*callback)(struct xenbus_watch *,
                 const char *path, const char *token);

is the natural thing to do.

Apply this change and adapt all users.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
Reviewed-by: NWei Liu <wei.liu2@citrix.com>
Reviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

5584ea25

07 11月, 2016 1 次提交

xen: make use of xenbus_read_unsigned() in xen-blkback · 8235777b

由 Juergen Gross 提交于 10月 31, 2016

Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of one read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Signed-off-by: NJuergen Gross <jgross@suse.com>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>

8235777b

01 11月, 2016 1 次提交

block,fs: use REQ_* flags directly · 70fd7614

由 Christoph Hellwig 提交于 11月 01, 2016

Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

70fd7614

22 7月, 2016 3 次提交

xen-blkback: really don't leak mode property · aea305e1

由 Jan Beulich 提交于 7月 07, 2016

Commit 9d092603 ("xen-blkback: do not leak mode property") left one
path unfixed; correct this.
Acked-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

aea305e1

xen-blkback: constify instance of "struct attribute_group" · 53043948

由 Jan Beulich 提交于 7月 07, 2016

The functions these get passed to have been taking pointers to const
since at least 2.6.16.
Acked-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

53043948

xen-blkback: prefer xenbus_scanf() over xenbus_gather() · 6694389a

由 Jan Beulich 提交于 7月 07, 2016

... for single items being collected: It is more typesafe (as the
compiler can check format string and to-be-written-to variable match)
and requires one less parameter to be passed.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Acked-by: NJens Axboe <axboe@kernel.dk>

6694389a

09 6月, 2016 1 次提交

block: add a separate operation type for secure erase · 288dab8a

由 Christoph Hellwig 提交于 6月 09, 2016

Instead of overloading the discard support with the REQ_SECURE flag.
Use the opportunity to rename the queue flag as well, and remove the
dead checks for this flag in the RAID 1 and RAID 10 drivers that don't
claim support for secure erase.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

288dab8a

08 6月, 2016 2 次提交

xen: use bio op accessors · a022606e

由 Mike Christie 提交于 6月 05, 2016

Separate the op from the rq_flag_bits and have xen
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a022606e

block/fs/drivers: remove rw argument from submit_bio · 4e49ea4a

由 Mike Christie 提交于 6月 05, 2016

This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.
Signed-off-by: NMike Christie <mchristi@redhat.com>

Fixed up fs/ext4/crypto.c
Signed-off-by: NJens Axboe <axboe@fb.com>

4e49ea4a

14 4月, 2016 1 次提交

block: kill off q->flush_flags · c888a8f9

由 Jens Axboe 提交于 4月 13, 2016

Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.
Signed-off-by: NJens Axboe <axboe@fb.com>

c888a8f9

04 3月, 2016 2 次提交

xen/blback: Fit the important information of the thread in 17 characters · fa3184b8

由 Konrad Rzeszutek Wilk 提交于 2月 03, 2016

The processes names are truncated to 17, while we had the length
of the process as name 20 - which meant that while we filled
it out with various details - the last 3 characters (which had
the queue number) never surfaced to the user-space.

To simplify this and be able to fit the device name, domain id,
and the queue number we remove the 'blkback' from the name.

Prior to this patch the device name is "blkback.<domid>.<name>"
for example: blkback.8.xvda, blkback.11.hda.

With the multiqueue block backend we add "-%d" for the queue.
But sadly this is already way past the limit so it gets stripped.

Possible solution had been identified by Ian:
http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html

  "
  If you are pressed for space then the "xvd" is probably a bit redundant
  in a string which starts blkbk.

  The guest may not even call the device xvdN (iirc BSD has another
  prefix) any how, so having blkback say so seems of limited use anyway.

  Since this seems to not include a partition number how does this work in
  the split partition scheme? (i.e. one where the guest is given xvda1 and
  xvda2 rather than xvda with a partition table)

[It will be 'blkback.8.xvda1', and 'blkback.11.xvda2']

  Perhaps something derived from one of the schemes in
  http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a
  better fit?

After a bit of discussion (see
http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html)
we settled on dropping the "blback" part.

This will make it possible to have the <domid>.<name>-<queue>:

 [1.xvda-0]
 [1.xvda-1]

And we enough space to make it go up to:

 [32100.xvdfg9-5]
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Reported-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fa3184b8

xen-blkback: advertise indirect segment support earlier · 5a705845

由 Jan Beulich 提交于 2月 10, 2016

There's no reason to defer this until the connect phase, and in fact
there are frontend implementations expecting this to be available
earlier. Move it into the probe function.
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Bob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

5a705845

05 1月, 2016 9 次提交

xen/blkback: Fix two memory leaks. · 93bb277f

由 Bob Liu 提交于 12月 10, 2015

This patch fixs two memleaks:
  backtrace:
    [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
    [<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0
    [<ffffffff81534028>] xen_blkbk_probe+0x58/0x230
    [<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130
    [<ffffffff81511716>] driver_probe_device+0x166/0x2c0
    [<ffffffff815119bc>] __device_attach_driver+0xac/0xb0
    [<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90
    [<ffffffff81511ab7>] __device_attach+0xc7/0x120
    [<ffffffff81511b23>] device_initial_probe+0x13/0x20
    [<ffffffff8151059a>] bus_probe_device+0x9a/0xb0
    [<ffffffff8150f0a1>] device_add+0x3b1/0x5c0
    [<ffffffff8150f47e>] device_register+0x1e/0x30
    [<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170
    [<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0
    [<ffffffff8146b1bb>] backend_changed+0x1b/0x20
    [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
unreferenced object 0xffff880007ba8ef8 (size 224):

  backtrace:
    [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
    [<ffffffff81205c73>] __kmalloc+0xd3/0x1e0
    [<ffffffff81534d87>] frontend_changed+0x2c7/0x580
    [<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0
    [<ffffffff8146b2c0>] frontend_changed+0x10/0x20
    [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
    [<ffffffff810d3e97>] kthread+0xd7/0xf0
    [<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8800048dcd38 (size 224):

The first leak is caused by not put() the be->blkif reference
which we had gotten in xen_blkif_alloc(), while the second is
us not freeing blkif->rings in the right place.
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

93bb277f

xen/blkback: make st_ statistics per ring · db6fbc10

由 Bob Liu 提交于 12月 09, 2015

Make st_* statistics per ring and the VBD sysfs would iterate over all the
rings.

Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings
are torn down, so it's safe.
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Aligned the variables on the same column.

db6fbc10

xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() · a6e7af12

由 Jiri Kosina 提交于 10月 26, 2015

xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of
every attempt to purge the LRU. This operation can't ever succeed though,
as the kthread hasn't marked itself as freezable.

Before (hopefully eventually) kthread freezing gets converted to fileystem
freezing, we'd rather mark xen_blkif_schedule() freezable (as it can
generate I/O during suspend).
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a6e7af12

xen/blkback: Free resources if connect_ring failed. · 2d0382fa

由 Konrad Rzeszutek Wilk 提交于 11月 25, 2015

With the multi-queue support we could fail at setting up
some of the rings and fail the connection. That meant that
all resources tied to rings[0..n-1] (where n is the ring
that failed to be setup). Eventually the frontend will switch
to the states and we will call xen_blkif_disconnect.

However we do not want to be at the mercy of the frontend
deciding when to change states. This allows us to do the
cleanup right away and freeing resources.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2d0382fa

xen/blocks: Return -EXX instead of -1 · bde21f73

由 Konrad Rzeszutek Wilk 提交于 11月 25, 2015

Lets return sensible values instead of -1.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

bde21f73

xen/blkback: make pool of persistent grants and free pages per-queue · d4bf0065

由 Bob Liu 提交于 11月 14, 2015

Make pool of persistent grants and free pages per-queue/ring instead of
per-device to get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Results:
iops1: After patch "xen/blkfront: make persistent grants per-queue".
iops2: After this patch.

Queues:			  1 	   4 	  	  8 	 	 16
Iops orig(k):		810 	1064 		780 		700
Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)

With 4 queues after this commit we can get ~75% increase in IOPS, and
performance won't drop if increasing queue numbers.

Please find the respective chart in this link:
https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d4bf0065

xen/blkback: get the number of hardware queues/rings from blkfront · d62d8600

由 Bob Liu 提交于 11月 14, 2015

Backend advertises "multi-queue-max-queues" to front, also get the negotiated
number from "multi-queue-num-queues" written by blkfront.
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d62d8600

xen/blkback: pseudo support for multi hardware queues/rings · 2fb1ef4f

由 Konrad Rzeszutek Wilk 提交于 12月 11, 2015

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in
"xen/blkback: get the number of hardware queues/rings from blkfront".
Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Align variables in the structures.

2fb1ef4f

xen/blkback: separate ring information out of struct xen_blkif · 59795700

由 Bob Liu 提交于 11月 14, 2015

Split per ring information to an new structure "xen_blkif_ring", so that one vbd
device can be associated with one or more rings/hardware queues.

Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
may have multi backend threads.

This patch is a preparation for supporting multi hardware queues/rings.
Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Align the variables in the structure.

59795700

18 12月, 2015 2 次提交

xen-blkback: read from indirect descriptors only once · 18779149

由 Roger Pau Monné 提交于 11月 03, 2015

Since indirect descriptors are in memory shared with the frontend, the
frontend could alter the first_sect and last_sect values after they have
been validated but before they are recorded in the request.  This may
result in I/O requests that overflow the foreign page, possibly
overwriting local pages when the I/O request is executed.

When parsing indirect descriptors, only read first_sect and last_sect
once.

This is part of XSA155.

CC: stable@vger.kernel.org
Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

18779149

xen-blkback: only read request operation from shared ring once · 1f13d75c

由 Roger Pau Monné 提交于 11月 03, 2015

A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.

When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().

This is part of XSA155.

CC: stable@vger.kernel.org
Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

1f13d75c

23 10月, 2015 2 次提交

xen/xenbus: Rename *RING_PAGE* to *RING_GRANT* · 9cce2914

由 Julien Grall 提交于 10月 13, 2015

Linux may use a different page size than the size of grant. So make
clear that the order is actually in number of grant.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

9cce2914

block/xen-blkback: Make it running on 64KB page granularity · 67de5dfb

由 Julien Grall 提交于 5月 05, 2015

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Acked-by: N"Roger Pau Monné" <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

67de5dfb

24 9月, 2015 1 次提交

xen/blkback: free requests on disconnection · f929d42c

由 Roger Pau Monne 提交于 9月 04, 2015

This is due to  commit 86839c56
"xen/block: add multi-page ring support"

When using an guest under UEFI - after the domain is destroyed
the following warning comes from blkback.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 #85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[<ffff8000000890a8>] dump_backtrace+0x0/0x124
[<ffff8000000891dc>] show_stack+0x10/0x1c
[<ffff8000007653bc>] dump_stack+0x78/0x98
[<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
[<ffff800000097f80>] warn_slowpath_null+0x14/0x20
[<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
[<ffff8000000ad020>] process_one_work+0x160/0x3b4
[<ffff8000000ad3b4>] worker_thread+0x140/0x494
[<ffff8000000b2e34>] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

Request allocation has been moved to connect_ring, which is called every
time blkback connects to the frontend (this can happen multiple times during
a blkback instance life cycle). On the other hand, request freeing has not
been moved, so it's only called when destroying the backend instance. Due to
this mismatch, blkback can allocate the request pool multiple times, without
freeing it.

In order to fix it, move the freeing of requests to xen_blkif_disconnect to
restore the symmetry between request allocation and freeing.
Reported-by: NJulien Grall <julien.grall@citrix.com>
Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
Tested-by: NJulien Grall <julien.grall@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
CC: stable@vger.kernel.org # 4.2
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

f929d42c

29 7月, 2015 1 次提交

block: add a bi_error field to struct bio · 4246a0b6

由 Christoph Hellwig 提交于 7月 20, 2015

Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4246a0b6

24 7月, 2015 1 次提交

xen-blkback: replace work_pending with work_busy in purge_persistent_gnt() · 53bc7dc0

由 Bob Liu 提交于 7月 22, 2015

The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.

There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.

CC: stable@vger.kernel.org
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

53bc7dc0

17 6月, 2015 1 次提交

block/xen-blkback: s/nr_pages/nr_segs/ · 6684fa1c

由 Julien Grall 提交于 6月 17, 2015

Make the code less confusing to read now that Linux may not have the
same page size as Xen.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

6684fa1c

06 6月, 2015 1 次提交

xen/block: add multi-page ring support · 86839c56

由 Bob Liu 提交于 6月 03, 2015

Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.

We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.

The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.

Changes in v2:
 - Rebased to 4.0-rc6.
 - Document on how multi-page ring feature working to linux io/blkif.h.

Changes in v3:
 - Remove changes to linux io/blkif.h and follow the protocol defined
   in io/blkif.h of XEN tree.
 - Rebased to 4.1-rc3

Changes in v4:
 - Turn to use 'ring-page-order' and 'max-ring-page-order'.
 - A few comments from Roger.

Changes in v5:
 - Clarify with 4k granularity to comment
 - Address more comments from Roger
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

86839c56

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功