1. 24 9月, 2015 1 次提交
    • R
      xen/blkback: free requests on disconnection · f929d42c
      Roger Pau Monne 提交于
      This is due to  commit 86839c56
      "xen/block: add multi-page ring support"
      
      When using an guest under UEFI - after the domain is destroyed
      the following warning comes from blkback.
      
      ------------[ cut here ]------------
      WARNING: CPU: 2 PID: 95 at
      /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
      xen_blkif_deferred_free+0x1f4/0x1f8()
      Modules linked in:
      CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 #85
      Hardware name: APM X-Gene Mustang board (DT)
      Workqueue: events xen_blkif_deferred_free
      Call trace:
      [<ffff8000000890a8>] dump_backtrace+0x0/0x124
      [<ffff8000000891dc>] show_stack+0x10/0x1c
      [<ffff8000007653bc>] dump_stack+0x78/0x98
      [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
      [<ffff800000097f80>] warn_slowpath_null+0x14/0x20
      [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
      [<ffff8000000ad020>] process_one_work+0x160/0x3b4
      [<ffff8000000ad3b4>] worker_thread+0x140/0x494
      [<ffff8000000b2e34>] kthread+0xd8/0xf0
      ---[ end trace 6f859b7883c88cdd ]---
      
      Request allocation has been moved to connect_ring, which is called every
      time blkback connects to the frontend (this can happen multiple times during
      a blkback instance life cycle). On the other hand, request freeing has not
      been moved, so it's only called when destroying the backend instance. Due to
      this mismatch, blkback can allocate the request pool multiple times, without
      freeing it.
      
      In order to fix it, move the freeing of requests to xen_blkif_disconnect to
      restore the symmetry between request allocation and freeing.
      Reported-by: NJulien Grall <julien.grall@citrix.com>
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: NJulien Grall <julien.grall@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: xen-devel@lists.xenproject.org
      CC: stable@vger.kernel.org # 4.2
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f929d42c
  2. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  3. 24 7月, 2015 1 次提交
  4. 17 6月, 2015 1 次提交
  5. 06 6月, 2015 2 次提交
    • B
      xen/block: add multi-page ring support · 86839c56
      Bob Liu 提交于
      Extend xen/block to support multi-page ring, so that more requests can be
      issued by using more than one pages as the request ring between blkfront
      and backend.
      As a result, the performance can get improved significantly.
      
      We got some impressive improvements on our highend iscsi storage cluster
      backend. If using 64 pages as the ring, the IOPS increased about 15 times
      for the throughput testing and above doubled for the latency testing.
      
      The reason was the limit on outstanding requests is 32 if use only one-page
      ring, but in our case the iscsi lun was spread across about 100 physical
      drives, 32 was really not enough to keep them busy.
      
      Changes in v2:
       - Rebased to 4.0-rc6.
       - Document on how multi-page ring feature working to linux io/blkif.h.
      
      Changes in v3:
       - Remove changes to linux io/blkif.h and follow the protocol defined
         in io/blkif.h of XEN tree.
       - Rebased to 4.1-rc3
      
      Changes in v4:
       - Turn to use 'ring-page-order' and 'max-ring-page-order'.
       - A few comments from Roger.
      
      Changes in v5:
       - Clarify with 4k granularity to comment
       - Address more comments from Roger
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      86839c56
    • B
      drivers: xen-blkback: delay pending_req allocation to connect_ring · 69b91ede
      Bob Liu 提交于
      This is a pre-patch for multi-page ring feature.
      In connect_ring, we can know exactly how many pages are used for the shared
      ring, delay pending_req allocation here so that we won't waste too much memory.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      69b91ede
  6. 27 4月, 2015 2 次提交
  7. 15 4月, 2015 1 次提交
  8. 07 4月, 2015 2 次提交
  9. 11 2月, 2015 1 次提交
  10. 28 1月, 2015 2 次提交
  11. 06 10月, 2014 1 次提交
  12. 02 10月, 2014 2 次提交
    • R
      xen-blkback: fix leak on grant map error path · 61cecca8
      Roger Pau Monné 提交于
      Fix leaking a page when a grant mapping has failed.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Reported-and-Tested-by: NTao Chen <boby.chen@huawei.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      61cecca8
    • V
      xen/blkback: unmap all persistent grants when frontend gets disconnected · 12ea7296
      Vitaly Kuznetsov 提交于
      blkback does not unmap persistent grants when frontend goes to Closed
      state (e.g. when blkfront module is being removed). This leads to the
      following in guest's dmesg:
      
      [  343.243825] xen:grant_table: WARNING: g.e. 0x445 still in use!
      [  343.243825] xen:grant_table: WARNING: g.e. 0x42a still in use!
      ...
      
      When load module -> use device -> unload module sequence is performed multiple times
      it is possible to hit BUG() condition in blkfront module:
      
      [  343.243825] kernel BUG at drivers/block/xen-blkfront.c:954!
      [  343.243825] invalid opcode: 0000 [#1] SMP
      [  343.243825] Modules linked in: xen_blkfront(-) ata_generic pata_acpi [last unloaded: xen_blkfront]
      ...
      [  343.243825] Call Trace:
      [  343.243825]  [<ffffffff814111ef>] ? unregister_xenbus_watch+0x16f/0x1e0
      [  343.243825]  [<ffffffffa0016fbf>] blkfront_remove+0x3f/0x140 [xen_blkfront]
      ...
      [  343.243825] RIP  [<ffffffffa0016aae>] blkif_free+0x34e/0x360 [xen_blkfront]
      [  343.243825]  RSP <ffff88001eb8fdc0>
      
      We don't need to keep these grants if we're disconnecting as frontend might already
      forgot about them. Solve the issue by moving xen_blkbk_free_caches() call from
      xen_blkif_free() to xen_blkif_disconnect().
      
      Now we can see the following:
      [  928.590893] xen:grant_table: WARNING: g.e. 0x587 still in use!
      [  928.591861] xen:grant_table: WARNING: g.e. 0x372 still in use!
      ...
      [  929.592146] xen:grant_table: freeing g.e. 0x587
      [  929.597174] xen:grant_table: freeing g.e. 0x372
      ...
      
      Backend does not keep persistent grants any more, reconnect works fine.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      12ea7296
  13. 29 5月, 2014 2 次提交
    • V
      xen-blkback: defer freeing blkif to avoid blocking xenwatch · 814d04e7
      Valentin Priescu 提交于
      Currently xenwatch blocks in VBD disconnect, waiting for all pending I/O
      requests to finish. If the VBD is attached to a hot-swappable disk, then
      xenwatch can hang for a long period of time, stalling other watches.
      
       INFO: task xenwatch:39 blocked for more than 120 seconds.
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       ffff880057f01bd0 0000000000000246 ffff880057f01ac0 ffffffff810b0782
       ffff880057f01ad0 00000000000131c0 0000000000000004 ffff880057edb040
       ffff8800344c6080 0000000000000000 ffff880058c00ba0 ffff880057edb040
       Call Trace:
       [<ffffffff810b0782>] ? irq_to_desc+0x12/0x20
       [<ffffffff8128f761>] ? list_del+0x11/0x40
       [<ffffffff8147a080>] ? wait_for_common+0x60/0x160
       [<ffffffff8147bcef>] ? _raw_spin_lock_irqsave+0x2f/0x50
       [<ffffffff8147bd49>] ? _raw_spin_unlock_irqrestore+0x19/0x20
       [<ffffffff8147a26a>] schedule+0x3a/0x60
       [<ffffffffa018fe6a>] xen_blkif_disconnect+0x8a/0x100 [xen_blkback]
       [<ffffffff81079f70>] ? wake_up_bit+0x40/0x40
       [<ffffffffa018ffce>] xen_blkbk_remove+0xae/0x1e0 [xen_blkback]
       [<ffffffff8130b254>] xenbus_dev_remove+0x44/0x90
       [<ffffffff81345cb7>] __device_release_driver+0x77/0xd0
       [<ffffffff81346488>] device_release_driver+0x28/0x40
       [<ffffffff813456e8>] bus_remove_device+0x78/0xe0
       [<ffffffff81342c9f>] device_del+0x12f/0x1a0
       [<ffffffff81342d2d>] device_unregister+0x1d/0x60
       [<ffffffffa0190826>] frontend_changed+0xa6/0x4d0 [xen_blkback]
       [<ffffffffa019c252>] ? frontend_changed+0x192/0x650 [xen_netback]
       [<ffffffff8130ae50>] ? cmp_dev+0x60/0x60
       [<ffffffff81344fe4>] ? bus_for_each_dev+0x94/0xa0
       [<ffffffff8130b06e>] xenbus_otherend_changed+0xbe/0x120
       [<ffffffff8130b4cb>] frontend_changed+0xb/0x10
       [<ffffffff81309c82>] xenwatch_thread+0xf2/0x130
       [<ffffffff81079f70>] ? wake_up_bit+0x40/0x40
       [<ffffffff81309b90>] ? xenbus_directory+0x80/0x80
       [<ffffffff810799d6>] kthread+0x96/0xa0
       [<ffffffff81485934>] kernel_thread_helper+0x4/0x10
       [<ffffffff814839f3>] ? int_ret_from_sys_call+0x7/0x1b
       [<ffffffff8147c17c>] ? retint_restore_args+0x5/0x6
       [<ffffffff81485930>] ? gs_change+0x13/0x13
      
      With this patch, when there is still pending I/O, the actual disconnect
      is done by the last reference holder (last pending I/O request). In this
      case, xenwatch doesn't block indefinitely.
      Signed-off-by: NValentin Priescu <priescuv@amazon.com>
      Reviewed-by: NSteven Kady <stevkady@amazon.com>
      Reviewed-by: NSteven Noonan <snoonan@amazon.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      814d04e7
    • O
      xen/blkback: disable discard feature if requested by toolstack · c926b701
      Olaf Hering 提交于
      Newer toolstacks may provide a boolean property "discard-enable" in the
      backend node. Its purpose is to disable discard for file backed storage
      to avoid fragmentation. Recognize this setting also for physical
      storage.  If that property exists and is false, do not advertise
      "feature-discard" to the frontend.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c926b701
  14. 12 2月, 2014 1 次提交
  15. 08 2月, 2014 4 次提交
  16. 03 2月, 2014 1 次提交
  17. 31 1月, 2014 1 次提交
    • Z
      xen/grant-table: Avoid m2p_override during mapping · 08ece5bb
      Zoltan Kiss 提交于
      The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
      for blkback and future netback patches it just cause a lock contention, as
      those pages never go to userspace. Therefore this series does the following:
      - the original functions were renamed to __gnttab_[un]map_refs, with a new
        parameter m2p_override
      - based on m2p_override either they follow the original behaviour, or just set
        the private flag and call set_phys_to_machine
      - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
        m2p_override false
      - a new function gnttab_[un]map_refs_userspace provides the old behaviour
      
      It also removes a stray space from page.h and change ret to 0 if
      XENFEAT_auto_translated_physmap, as that is the only possible return value
      there.
      
      v2:
      - move the storing of the old mfn in page->index to gnttab_map_refs
      - move the function header update to a separate patch
      
      v3:
      - a new approach to retain old behaviour where it needed
      - squash the patches into one
      
      v4:
      - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
      - clear page->private before doing anything with the page, so m2p_find_override
        won't race with this
      
      v5:
      - change return value handling in __gnttab_[un]map_refs
      - remove a stray space in page.h
      - add detail why ret = 0 now at some places
      
      v6:
      - don't pass pfn to m2p* functions, just get it locally
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      08ece5bb
  18. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  19. 09 11月, 2013 1 次提交
  20. 12 9月, 2013 1 次提交
  21. 04 7月, 2013 1 次提交
  22. 25 6月, 2013 1 次提交
  23. 22 6月, 2013 1 次提交
  24. 18 6月, 2013 1 次提交
    • K
      xen/blkback: Check for insane amounts of request on the ring (v6). · 8e3f8755
      Konrad Rzeszutek Wilk 提交于
      Check that the ring does not have an insane amount of requests
      (more than there could fit on the ring).
      
      If we detect this case we will stop processing the requests
      and wait until the XenBus disconnects the ring.
      
      The existing check RING_REQUEST_CONS_OVERFLOW which checks for how
      many responses we have created in the past (rsp_prod_pvt) vs
      requests consumed (req_cons) and whether said difference is greater or
      equal to the size of the ring, does not catch this case.
      
      Wha the condition does check if there is a need to process more
      as we still have a backlog of responses to finish. Note that both
      of those values (rsp_prod_pvt and req_cons) are not exposed on the
      shared ring.
      
      To understand this problem a mini crash course in ring protocol
      response/request updates is in place.
      
      There are four entries: req_prod and rsp_prod; req_event and rsp_event
      to track the ring entries. We are only concerned about the first two -
      which set the tone of this bug.
      
      The req_prod is a value incremented by frontend for each request put
      on the ring. Conversely the rsp_prod is a value incremented by the backend
      for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when
      pushing the responses on the ring).  Both values can
      wrap and are modulo the size of the ring (in block case that is 32).
      Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details.
      
      The culprit here is that if the difference between the
      req_prod and req_cons is greater than the ring size we have a problem.
      Fortunately for us, the '__do_block_io_op' loop:
      
      	rc = blk_rings->common.req_cons;
      	rp = blk_rings->common.sring->req_prod;
      
      	while (rc != rp) {
      
      		..
      		blk_rings->common.req_cons = ++rc; /* before make_response() */
      
      	}
      
      will loop up to the point when rc == rp. The macros inside of the
      loop (RING_GET_REQUEST) is smart and is indexing based on the modulo
      of the ring size. If the frontend has provided a bogus req_prod value
      we will loop until the 'rc == rp' - which means we could be processing
      already processed requests (or responses) often.
      
      The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is
      b/c it only tracks how many responses we have internally produced
      and whether we would should process more. The astute reader will
      notice that the macro RING_REQUEST_CONS_OVERFLOW provides two
      arguments - more on this later.
      
      For example, if we were to enter this function with these values:
      
             	blk_rings->common.sring->req_prod =  X+31415 (X is the value from
      		the last time __do_block_io_op was called).
              blk_rings->common.req_cons = X
              blk_rings->common.rsp_prod_pvt = X
      
      The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons)
      is doing:
      
      	req_cons - rsp_prod_pvt >= 32
      
      Which is,
      	X - X >= 32 or 0 >= 32
      
      And that is false, so we continue on looping (this bug).
      
      If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp
      instead (sring->req_prod) of rc, the this macro can do the check:
      
           req_prod - rsp_prov_pvt >= 32
      
      Which is,
             X + 31415 - X >= 32 , or 31415 >= 32
      
      which is true, so we can error out and break out of the function.
      
      Unfortunatly the difference between rsp_prov_pvt and req_prod can be
      at 32 (which would error out in the macro). This condition exists when
      the backend is lagging behind with the responses and still has not finished
      responding to all of them (so make_response has not been called), and
      the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able
      to use said macro.
      
      Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does
      a simple check of:
      
          req_prod - rsp_prod_pvt > RING_SIZE
      
      And with the X values from above:
      
         X + 31415 - X > 32
      
      Returns true. Also not that if the ring is full (which is where
      the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the
      same condition:
      
         X + 32 - X > 32
      
      Which is false.
      
      Lets use that macro.
      Note that in v5 of this patchset the macro was different - we used an
      earlier version.
      
      Cc: stable@vger.kernel.org
      [v1: Move the check outside the loop]
      [v2: Add a pr_warn as suggested by David]
      [v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan]
      [v4: Move wake_up after kthread_stop as suggested by Jan]
      [v5: Use RING_REQUEST_PROD_OVERFLOW instead]
      [v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      
      gadsa
      8e3f8755
  25. 08 6月, 2013 2 次提交
    • K
      xen/blkback: Check device permissions before allowing OP_DISCARD · 604c499c
      Konrad Rzeszutek Wilk 提交于
      We need to make sure that the device is not RO or that
      the request is not past the number of sectors we want to
      issue the DISCARD operation for.
      
      This fixes CVE-2013-2140.
      
      Cc: stable@vger.kernel.org
      Acked-by: NJan Beulich <JBeulich@suse.com>
      Acked-by: NIan Campbell <Ian.Campbell@citrix.com>
      [v1: Made it pr_warn instead of pr_debug]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      604c499c
    • S
      xen/blkback: Use physical sector size for setup · 7c4d7d71
      Stefan Bader 提交于
      Currently xen-blkback passes the logical sector size over xenbus and
      xen-blkfront sets up the paravirt disk with that logical block size.
      But newer drives usually have the logical sector size set to 512 for
      compatibility reasons and would show the actual sector size only in
      physical sector size.
      This results in the device being partitioned and accessed in dom0 with
      the correct sector size, but the guest thinks 512 bytes is the correct
      block size. And that results in poor performance.
      
      To fix this, blkback gets modified to pass also physical-sector-size
      over xenbus and blkfront to use both values to set up the paravirt
      disk. I did not just change the passed in sector-size because I am
      not sure having a bigger logical sector size than the physical one
      is valid (and that would happen if a newer dom0 kernel hits an older
      domU kernel). Also this way a domU set up before should still be
      accessible (just some tools might detect the unaligned setup).
      
      [v2: Make xenbus write failure non-fatal]
      [v3: Use xenbus_scanf instead of xenbus_gather]
      [v4: Rebased against segment changes]
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7c4d7d71
  26. 07 5月, 2013 1 次提交
  27. 19 4月, 2013 1 次提交
    • R
      xen-block: implement indirect descriptors · 402b27f9
      Roger Pau Monne 提交于
      Indirect descriptors introduce a new block operation
      (BLKIF_OP_INDIRECT) that passes grant references instead of segments
      in the request. This grant references are filled with arrays of
      blkif_request_segment_aligned, this way we can send more segments in a
      request.
      
      The proposed implementation sets the maximum number of indirect grefs
      (frames filled with blkif_request_segment_aligned) to 256 in the
      backend and 32 in the frontend. The value in the frontend has been
      chosen experimentally, and the backend value has been set to a sane
      value that allows expanding the maximum number of indirect descriptors
      in the frontend if needed.
      
      The migration code has changed from the previous implementation, in
      which we simply remapped the segments on the shared ring. Now the
      maximum number of segments allowed in a request can change depending
      on the backend, so we have to requeue all the requests in the ring and
      in the queue and split the bios in them if they are bigger than the
      new maximum number of segments.
      
      [v2: Fixed minor comments by Konrad.
      [v1: Added padding to make the indirect request 64bit aligned.
       Added some BUGs, comments; fixed number of indirect pages in
       blkif_get_x86_{32/64}_req. Added description about the indirect operation
       in blkif.h]
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      [v3: Fixed spaces and tabs mix ups]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      402b27f9
  28. 18 4月, 2013 3 次提交