1. 08 6月, 2013 1 次提交
    • S
      xen/blkback: Use physical sector size for setup · 7c4d7d71
      Stefan Bader 提交于
      Currently xen-blkback passes the logical sector size over xenbus and
      xen-blkfront sets up the paravirt disk with that logical block size.
      But newer drives usually have the logical sector size set to 512 for
      compatibility reasons and would show the actual sector size only in
      physical sector size.
      This results in the device being partitioned and accessed in dom0 with
      the correct sector size, but the guest thinks 512 bytes is the correct
      block size. And that results in poor performance.
      
      To fix this, blkback gets modified to pass also physical-sector-size
      over xenbus and blkfront to use both values to set up the paravirt
      disk. I did not just change the passed in sector-size because I am
      not sure having a bigger logical sector size than the physical one
      is valid (and that would happen if a newer dom0 kernel hits an older
      domU kernel). Also this way a domU set up before should still be
      accessible (just some tools might detect the unaligned setup).
      
      [v2: Make xenbus write failure non-fatal]
      [v3: Use xenbus_scanf instead of xenbus_gather]
      [v4: Rebased against segment changes]
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7c4d7d71
  2. 05 6月, 2013 1 次提交
  3. 08 5月, 2013 1 次提交
  4. 19 4月, 2013 1 次提交
    • R
      xen-block: implement indirect descriptors · 402b27f9
      Roger Pau Monne 提交于
      Indirect descriptors introduce a new block operation
      (BLKIF_OP_INDIRECT) that passes grant references instead of segments
      in the request. This grant references are filled with arrays of
      blkif_request_segment_aligned, this way we can send more segments in a
      request.
      
      The proposed implementation sets the maximum number of indirect grefs
      (frames filled with blkif_request_segment_aligned) to 256 in the
      backend and 32 in the frontend. The value in the frontend has been
      chosen experimentally, and the backend value has been set to a sane
      value that allows expanding the maximum number of indirect descriptors
      in the frontend if needed.
      
      The migration code has changed from the previous implementation, in
      which we simply remapped the segments on the shared ring. Now the
      maximum number of segments allowed in a request can change depending
      on the backend, so we have to requeue all the requests in the ring and
      in the queue and split the bios in them if they are bigger than the
      new maximum number of segments.
      
      [v2: Fixed minor comments by Konrad.
      [v1: Added padding to make the indirect request 64bit aligned.
       Added some BUGs, comments; fixed number of indirect pages in
       blkif_get_x86_{32/64}_req. Added description about the indirect operation
       in blkif.h]
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      [v3: Fixed spaces and tabs mix ups]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      402b27f9
  5. 20 3月, 2013 2 次提交
  6. 19 3月, 2013 2 次提交
  7. 20 2月, 2013 1 次提交
    • K
      xen-blkfront: drop the use of llist_for_each_entry_safe · f84adf49
      Konrad Rzeszutek Wilk 提交于
      Replace llist_for_each_entry_safe with a while loop.
      
      llist_for_each_entry_safe can trigger a bug in GCC 4.1, so it's best
      to remove it and use a while loop and do the deletion manually.
      
      Specifically this bug can be triggered by hot-unplugging a disk, either
      by doing xm block-detach or by save/restore cycle.
      
      BUG: unable to handle kernel paging request at fffffffffffffff0
      IP: [<ffffffffa0047223>] blkif_free+0x63/0x130 [xen_blkfront]
      The crash call trace is:
      	...
      bad_area_nosemaphore+0x13/0x20
      do_page_fault+0x25e/0x4b0
      page_fault+0x25/0x30
      ? blkif_free+0x63/0x130 [xen_blkfront]
      blkfront_resume+0x46/0xa0 [xen_blkfront]
      xenbus_dev_resume+0x6c/0x140
      pm_op+0x192/0x1b0
      device_resume+0x82/0x1e0
      dpm_resume+0xc9/0x1a0
      dpm_resume_end+0x15/0x30
      do_suspend+0x117/0x1e0
      
      When drilling down to the assembler code, on newer GCC it does
      .L29:
              cmpq    $-16, %r12      #, persistent_gnt check
              je      .L30    	#, out of the loop
      .L25:
      	... code in the loop
              testq   %r13, %r13      # n
              je      .L29    	#, back to the top of the loop
              cmpq    $-16, %r12      #, persistent_gnt check
              movq    16(%r12), %r13  # <variable>.node.next, n
              jne     .L25    	#,	back to the top of the loop
      .L30:
      
      While on GCC 4.1, it is:
      L78:
      	... code in the loop
      	testq   %r13, %r13      # n
              je      .L78    #,	back to the top of the loop
              movq    16(%rbx), %r13  # <variable>.node.next, n
              jmp     .L78    #,	back to the top of the loop
      
      Which basically means that the exit loop condition instead of
      being:
      
      	&(pos)->member != NULL;
      
      is:
      	;
      
      which makes the loop unbound.
      
      Since xen-blkfront is the only user of the llist_for_each_entry_safe
      macro remove it from llist.h.
      
      Orabug: 16263164
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f84adf49
  8. 18 12月, 2012 2 次提交
  9. 27 11月, 2012 1 次提交
  10. 04 11月, 2012 1 次提交
    • R
      xen/blkback: persistent-grants fixes · cb5bd4d1
      Roger Pau Monne 提交于
      This patch contains fixes for persistent grants implementation v2:
      
       * handle == 0 is a valid handle, so initialize grants in blkback
         setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported
         by Konrad Rzeszutek Wilk.
      
       * new_map is a boolean, use "true" or "false" instead of 1 and 0.
         Reported by Konrad Rzeszutek Wilk.
      
       * blkfront announces the persistent-grants feature as
         feature-persistent-grants, use feature-persistent instead which is
         consistent with blkback and the public Xen headers.
      
       * Add a consistency check in blkfront to make sure we don't try to
         access segments that have not been set.
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com>
      [v1: The new_map int->bool had already been changed]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cb5bd4d1
  11. 30 10月, 2012 1 次提交
    • R
      xen/blkback: Persistent grant maps for xen blk drivers · 0a8704a5
      Roger Pau Monne 提交于
      This patch implements persistent grants for the xen-blk{front,back}
      mechanism. The effect of this change is to reduce the number of unmap
      operations performed, since they cause a (costly) TLB shootdown. This
      allows the I/O performance to scale better when a large number of VMs
      are performing I/O.
      
      Previously, the blkfront driver was supplied a bvec[] from the request
      queue. This was granted to dom0; dom0 performed the I/O and wrote
      directly into the grant-mapped memory and unmapped it; blkfront then
      removed foreign access for that grant. The cost of unmapping scales
      badly with the number of CPUs in Dom0. An experiment showed that when
      Dom0 has 24 VCPUs, and guests are performing parallel I/O to a
      ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests
      (at which point 650,000 IOPS are being performed in total). If more
      than 5 guests are used, the performance declines. By 10 guests, only
      400,000 IOPS are being performed.
      
      This patch improves performance by only unmapping when the connection
      between blkfront and back is broken.
      
      On startup blkfront notifies blkback that it is using persistent
      grants, and blkback will do the same. If blkback is not capable of
      persistent mapping, blkfront will still use the same grants, since it
      is compatible with the previous protocol, and simplifies the code
      complexity in blkfront.
      
      To perform a read, in persistent mode, blkfront uses a separate pool
      of pages that it maps to dom0. When a request comes in, blkfront
      transmutes the request so that blkback will write into one of these
      free pages. Blkback keeps note of which grefs it has already
      mapped. When a new ring request comes to blkback, it looks to see if
      it has already mapped that page. If so, it will not map it again. If
      the page hasn't been previously mapped, it is mapped now, and a record
      is kept of this mapping. Blkback proceeds as usual. When blkfront is
      notified that blkback has completed a request, it memcpy's from the
      shared memory, into the bvec supplied. A record that the {gref, page}
      tuple is mapped, and not inflight is kept.
      
      Writes are similar, except that the memcpy is peformed from the
      supplied bvecs, into the shared pages, before the request is put onto
      the ring.
      
      Blkback stores a mapping of grefs=>{page mapped to by gref} in
      a red-black tree. As the grefs are not known apriori, and provide no
      guarantees on their ordering, we have to perform a search
      through this tree to find the page, for every gref we receive. This
      operation takes O(log n) time in the worst case. In blkfront grants
      are stored using a single linked list.
      
      The maximum number of grants that blkback will persistenly map is
      currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to
      prevent a malicios guest from attempting a DoS, by supplying fresh
      grefs, causing the Dom0 kernel to map excessively. If a guest
      is using persistent grants and exceeds the maximum number of grants to
      map persistenly the newly passed grefs will be mapped and unmaped.
      Using this approach, we can have requests that mix persistent and
      non-persistent grants, and we need to handle them correctly.
      This allows us to set the maximum number of persistent grants to a
      lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although
      setting it will lead to unpredictable performance.
      
      In writing this patch, the question arrises as to if the additional
      cost of performing memcpys in the guest (to/from the pool of granted
      pages) outweigh the gains of not performing TLB shootdowns. The answer
      to that question is `no'. There appears to be very little, if any
      additional cost to the guest of using persistent grants. There is
      perhaps a small saving, from the reduced number of hypercalls
      performed in granting, and ending foreign access.
      Signed-off-by: NOliver Chick <oliver.chick@citrix.com>
      Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Fixed up the misuse of bool as int]
      0a8704a5
  12. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  13. 19 7月, 2012 1 次提交
    • T
      xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op · 89c30f16
      Theodore Ts'o 提交于
      With the changes in the random tree, IRQF_SAMPLE_RANDOM is now a
      no-op; interrupt randomness is now collected unconditionally in a very
      low-overhead fashion; see commit 775f4b29.  The IRQF_SAMPLE_RANDOM
      flag was scheduled to be removed in 2009 on the
      feature-removal-schedule, so this patch is preparation for the final
      removal of this flag.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      89c30f16
  14. 12 6月, 2012 1 次提交
    • K
      xen/blkfront: Add WARN to deal with misbehaving backends. · 6878c32e
      Konrad Rzeszutek Wilk 提交于
      Part of the ring structure is the 'id' field which is under
      control of the frontend. The frontend stamps it with "some"
      value (this some in this implementation being a value less
      than BLK_RING_SIZE), and when it gets a response expects
      said value to be in the response structure. We have a check
      for the id field when spolling new requests but not when
      de-spolling responses.
      
      We also add an extra check in add_id_to_freelist to make
      sure that the 'struct request' was not NULL - as we cannot
      pass a NULL to __blk_end_request_all, otherwise that crashes
      (and all the operations that the response is dealing with
      end up with __blk_end_request_all).
      
      Lastly we also print the name of the operation that failed.
      
      [v1: s/BUG/WARN/ suggested by Stefano]
      [v2: Add extra check in add_id_to_freelist]
      [v3: Redid op_name per Jan's suggestion]
      [v4: add const * and add WARN on failure returns]
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6878c32e
  15. 12 5月, 2012 2 次提交
  16. 07 4月, 2012 1 次提交
  17. 22 3月, 2012 1 次提交
  18. 20 3月, 2012 3 次提交
  19. 05 1月, 2012 1 次提交
    • J
      Xen: consolidate and simplify struct xenbus_driver instantiation · 73db144b
      Jan Beulich 提交于
      The 'name', 'owner', and 'mod_name' members are redundant with the
      identically named fields in the 'driver' sub-structure. Rather than
      switching each instance to specify these fields explicitly, introduce
      a macro to simplify this.
      
      Eliminate further redundancy by allowing the drvname argument to
      DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from
      the ID table will be used for .driver.name).
      
      Also eliminate the questionable xenbus_register_{back,front}end()
      wrappers - their sole remaining purpose was the checking of the
      'owner' field, proper setting of which shouldn't be an issue anymore
      when the macro gets used.
      
      v2: Restore DRV_NAME for the driver name in xen-pciback.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73db144b
  20. 17 12月, 2011 1 次提交
  21. 19 11月, 2011 2 次提交
  22. 13 10月, 2011 4 次提交
  23. 15 7月, 2011 2 次提交
  24. 12 5月, 2011 2 次提交
  25. 09 3月, 2011 1 次提交
  26. 26 2月, 2011 1 次提交
  27. 24 12月, 2010 1 次提交
    • T
      xen: don't use flush_scheduled_work() · 30d65030
      Tejun Heo 提交于
      flush_scheduled_work() is deprecated and scheduled to be removed.
      Directly flush info->work instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      30d65030
  28. 16 12月, 2010 1 次提交