1. 20 2月, 2013 1 次提交
  2. 16 1月, 2013 1 次提交
    • M
      xen/grant-table: correctly initialize grant table version 1 · d0b4d64a
      Matt Wilson 提交于
      Commit 85ff6acb (xen/granttable: Grant
      tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from
      a constant to a conditional expression. The expression depends on
      grant_table_version being appropriately set. Unfortunately, at init
      time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME
      conditional expression checks for "grant_table_version == 1", and
      therefore returns the number of grant references per frame for v2.
      
      This causes gnttab_init() to allocate fewer pages for gnttab_list, as
      a frame can old half the number of v2 entries than v1 entries. After
      gnttab_resume() is called, grant_table_version is appropriately
      set. nr_init_grefs will then be miscalculated and gnttab_free_count
      will hold a value larger than the actual number of free gref entries.
      
      If a guest is heavily utilizing improperly initialized v1 grant
      tables, memory corruption can occur. One common manifestation is
      corruption of the vmalloc list, resulting in a poisoned pointer
      derefrence when accessing /proc/meminfo or /proc/vmallocinfo:
      
      [   40.770064] BUG: unable to handle kernel paging request at 0000200200001407
      [   40.770083] IP: [<ffffffff811a6fb0>] get_vmalloc_info+0x70/0x110
      [   40.770102] PGD 0
      [   40.770107] Oops: 0000 [#1] SMP
      [   40.770114] CPU 10
      
      This patch introduces a static variable, grefs_per_grant_frame, to
      cache the calculated value. gnttab_init() now calls
      gnttab_request_version() early so that grant_table_version and
      grefs_per_grant_frame can be appropriately set. A few BUG_ON()s have
      been added to prevent this type of bug from reoccurring in the future.
      Signed-off-by: NMatt Wilson <msw@amazon.com>
      Reviewed-and-Tested-by: NSteven Noonan <snoonan@amazon.com>
      Acked-by: NIan Campbell <Ian.Campbell@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Annie Li <annie.li@oracle.com>
      Cc: xen-devel@lists.xen.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # v3.3 and newer
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d0b4d64a
  3. 04 1月, 2013 1 次提交
    • G
      Drivers: xen: remove __dev* attributes. · 345a5255
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, and __devinitdata from these
      drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      345a5255
  4. 20 10月, 2012 1 次提交
  5. 21 9月, 2012 1 次提交
    • A
      xen/gndev: Xen backend support for paged out grant targets V4. · c571898f
      Andres Lagar-Cavilla 提交于
      Since Xen-4.2, hvm domains may have portions of their memory paged out. When a
      foreign domain (such as dom0) attempts to map these frames, the map will
      initially fail. The hypervisor returns a suitable errno, and kicks an
      asynchronous page-in operation carried out by a helper. The foreign domain is
      expected to retry the mapping operation until it eventually succeeds. The
      foreign domain is not put to sleep because itself could be the one running the
      pager assist (typical scenario for dom0).
      
      This patch adds support for this mechanism for backend drivers using grant
      mapping and copying operations. Specifically, this covers the blkback and
      gntdev drivers (which map foreign grants), and the netback driver (which copies
      foreign grants).
      
      * Add a retry method for grants that fail with GNTST_eagain (i.e. because the
        target foreign frame is paged out).
      * Insert hooks with appropriate wrappers in the aforementioned drivers.
      
      The retry loop is only invoked if the grant operation status is GNTST_eagain.
      It guarantees to leave a new status code different from GNTST_eagain. Any other
      status code results in identical code execution as before.
      
      The retry loop performs 256 attempts with increasing time intervals through a
      32 second period. It uses msleep to yield while waiting for the next retry.
      
      V2 after feedback from David Vrabel:
      * Explicit MAX_DELAY instead of wrap-around delay into zero
      * Abstract GNTST_eagain check into core grant table code for netback module.
      
      V3 after feedback from Ian Campbell:
      * Add placeholder in array of grant table error descriptions for unrelated
        error code we jump over.
      * Eliminate single map and retry macro in favor of a generic batch flavor.
      * Some renaming.
      * Bury most implementation in grant_table.c, cleaner interface.
      
      V4 rebased on top of sync of Xen grant table interface headers.
      Signed-off-by: NAndres Lagar-Cavilla <andres@lagarcavilla.org>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Fixed whitespace issues]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c571898f
  6. 12 9月, 2012 1 次提交
    • S
      xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee
      Stefano Stabellini 提交于
      If the caller passes a valid kmap_op to m2p_add_override, we use
      kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
      part of the interface with Xen and if we are batching the hypercalls it
      might not have been written by the hypervisor yet. That means that later
      on Xen will write to it and we'll think that the original mfn is
      actually what Xen has written to it.
      
      Rather than "stealing" struct members from kmap_op, keep using
      page->index to store the original mfn and add another parameter to
      m2p_remove_override to get the corresponding kmap_op instead.
      It is now responsibility of the caller to keep track of which kmap_op
      corresponds to a particular page in the m2p_override (gntdev, the only
      user of this interface that passes a valid kmap_op, is already doing that).
      
      CC: stable@kernel.org
      Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2fc136ee
  7. 22 8月, 2012 2 次提交
  8. 08 5月, 2012 1 次提交
    • S
      xen: enter/exit lazy_mmu_mode around m2p_override calls · f62805f1
      Stefano Stabellini 提交于
      This patch is a significant performance improvement for the
      m2p_override: about 6% using the gntdev device.
      
      Each m2p_add/remove_override call issues a MULTI_grant_table_op and a
      __flush_tlb_single if kmap_op != NULL.  Batching all the calls together
      is a great performance benefit because it means issuing one hypercall
      total rather than two hypercall per page.
      If paravirt_lazy_mode is set PARAVIRT_LAZY_MMU, all these calls are
      going to be batched together, otherwise they are issued one at a time.
      
      Adding arch_enter_lazy_mmu_mode/arch_leave_lazy_mmu_mode around the
      m2p_add/remove_override calls forces paravirt_lazy_mode to
      PARAVIRT_LAZY_MMU, therefore makes sure that they are always batched.
      
      However it is not safe to call arch_enter_lazy_mmu_mode if we are in
      interrupt context or if we are already in PARAVIRT_LAZY_MMU mode, so
      check for both conditions before doing so.
      
      Changes in v4:
      - rebased on 3.4-rc4: all the m2p_override users call gnttab_unmap_refs
      and gnttab_map_refs;
      - check whether we are in interrupt context and the lazy_mode we are in
      before calling arch_enter/leave_lazy_mmu_mode.
      
      Changes in v3:
      - do not call arch_enter/leave_lazy_mmu_mode in xen_blkbk_unmap, that
      can be called in interrupt context.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v5: s/int lazy/bool lazy/]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f62805f1
  9. 18 4月, 2012 1 次提交
  10. 17 4月, 2012 1 次提交
  11. 28 1月, 2012 1 次提交
    • K
      xen/granttable: Disable grant v2 for HVM domains. · 69e8f430
      Konrad Rzeszutek Wilk 提交于
      As proper scaffolding for supporting error status is not yet
      implemented.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000400
      IP: [<ffffffff81375ae9>] gnttab_end_foreign_access_ref_v2+0x29/0x40
      PGD 32aa3067 PUD 32a87067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      CPU 0
      Modules linked in: sg sr_mod cdrom ata_generic ata_piix libata scsi_mod xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront
      cmd
      
      Pid: 2307, comm: ip Not tainted 3.3.0-rc1 #1 Xen HVM domU
      RIP: 0010:[<ffffffff81375ae9>]  [<ffffffff81375ae9>] gnttab_end_foreign_access_ref_v2+0x29/0x40
      RSP: 0018:ffff88003be03d38  EFLAGS: 00010206
      RAX: 0000000000000000 RBX: ffff880033210640 RCX: 0000000000000040
      RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000000000200
      RBP: ffff88003be03d38 R08: 0000000000000101 R09: 0000000000000000
      R10: dead000000100100 R11: 0000000000000000 R12: ffff88003be03e48
      R13: 0000000000000001 R14: ffff880039461c00 R15: 0000000000000200
      FS:  00007fb1f84ec700(0000) GS:ffff88003be00000(0000) knlGS:0000000000000000
      ...
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      69e8f430
  12. 21 12月, 2011 1 次提交
  13. 17 12月, 2011 3 次提交
  14. 22 11月, 2011 4 次提交
  15. 29 9月, 2011 1 次提交
    • S
      xen: modify kernel mappings corresponding to granted pages · 0930bba6
      Stefano Stabellini 提交于
      If we want to use granted pages for AIO, changing the mappings of a user
      vma and the corresponding p2m is not enough, we also need to update the
      kernel mappings accordingly.
      Currently this is only needed for pages that are created for user usages
      through /dev/xen/gntdev. As in, pages that have been in use by the
      kernel and use the P2M will not need this special mapping.
      However there are no guarantees that in the future the kernel won't
      start accessing pages through the 1:1 even for internal usage.
      
      In order to avoid the complexity of dealing with highmem, we allocated
      the pages lowmem.
      We issue a HYPERVISOR_grant_table_op right away in
      m2p_add_override and we remove the mappings using another
      HYPERVISOR_grant_table_op in m2p_remove_override.
      Considering that m2p_add_override and m2p_remove_override are called
      once per page we use multicalls and hypercall batching.
      
      Use the kmap_op pointer directly as argument to do the mapping as it is
      guaranteed to be present up until the unmapping is done.
      Before issuing any unmapping multicalls, we need to make sure that the
      mapping has already being done, because we need the kmap->handle to be
      set correctly.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v1: Removed GRANT_FRAME_BIT usage]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0930bba6
  16. 04 8月, 2011 1 次提交
  17. 27 7月, 2011 2 次提交
  18. 19 5月, 2011 1 次提交
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5431d52
  19. 18 4月, 2011 1 次提交
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · cf8d9163
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf8d9163
  20. 10 3月, 2011 1 次提交
    • I
      xen/p2m/m2p/gnttab: do not add failed grant maps to m2p override · dc4972a4
      Ian Campbell 提交于
      The caller will not undo a mapping which failed and therefore the
      override will not be removed.
      
      This is especially bad in the case of GNTMAP_contains_pte mapping type
      mappings where m2p_add_override will destroy the kernel mapping of the
      page.
      
      This was observed via a failure of map_grant_pages in gntdev_mmap (due
      to userspace using a bad grant reference), which left the page in
      question unmapped (because it was a GNTMAP_contains_pte mapping) which
      led to a crash later on.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      dc4972a4
  21. 15 2月, 2011 1 次提交
    • D
      xen-gntdev: Support mapping in HVM domains · aab8f11a
      Daniel De Graaf 提交于
      HVM does not allow direct PTE modification, so instead we request
      that Xen change its internal p2m mappings on the allocated pages and
      map the memory into userspace normally.
      
      Note:
      The HVM path for map and unmap is slightly different: HVM keeps the pages
      mapped until the area is deleted, while the PV case (use_ptemod being true)
      must unmap them when userspace unmaps the range. In the normal use case,
      this makes no difference to users since unmap time is deletion time.
      
      [v2: Expanded commit descr.]
      Signed-off-by: NDaniel De Graaf <dgdegra@tycho.nsa.gov>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      aab8f11a
  22. 12 1月, 2011 2 次提交
  23. 23 7月, 2010 1 次提交
    • S
      xen: Xen PCI platform device driver. · 183d03cc
      Stefano Stabellini 提交于
      Add the xen pci platform device driver that is responsible
      for initializing the grant table and xenbus in PV on HVM mode.
      Few changes to xenbus and grant table are necessary to allow the delayed
      initialization in HVM mode.
      Grant table needs few additional modifications to work in HVM mode.
      
      The Xen PCI platform device raises an irq every time an event has been
      delivered to us. However these interrupts are only delivered to vcpu 0.
      The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
      that is a little wrapper around __xen_evtchn_do_upcall, the traditional
      Xen upcall handler, the very same used with traditional PV guests.
      
      When running on HVM the event channel upcall is never called while in
      progress because it is a normal Linux irq handler (and we cannot switch
      the irq chip wholesale to the Xen PV ones as we are running QEMU and
      might have passed in PCI devices), therefore we cannot be sure that
      evtchn_upcall_pending is 0 when returning.
      For this reason if evtchn_upcall_pending is set by Xen we need to loop
      again on the event channels set pending otherwise we might loose some
      event channel deliveries.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      183d03cc
  24. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  25. 05 11月, 2009 1 次提交
  26. 17 12月, 2008 1 次提交
  27. 20 8月, 2008 1 次提交
  28. 27 5月, 2008 1 次提交
    • J
      xen: implement save/restore · 0e91398f
      Jeremy Fitzhardinge 提交于
      This patch implements Xen save/restore and migration.
      
      Saving is triggered via xenbus, which is polled in
      drivers/xen/manage.c.  When a suspend request comes in, the kernel
      prepares itself for saving by:
      
      1 - Freeze all processes.  This is primarily to prevent any
          partially-completed pagetable updates from confusing the suspend
          process.  If CONFIG_PREEMPT isn't defined, then this isn't necessary.
      
      2 - Suspend xenbus and other devices
      
      3 - Stop_machine, to make sure all the other vcpus are quiescent.  The
          Xen tools require the domain to run its save off vcpu0.
      
      4 - Within the stop_machine state, it pins any unpinned pgds (under
          construction or destruction), performs canonicalizes various other
          pieces of state (mostly converting mfns to pfns), and finally
      
      5 - Suspend the domain
      
      Restore reverses the steps used to save the domain, ending when all
      the frozen processes are thawed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      0e91398f
  29. 25 4月, 2008 3 次提交
  30. 05 4月, 2008 1 次提交