1. 04 10月, 2012 1 次提交
  2. 03 10月, 2012 1 次提交
  3. 24 9月, 2012 2 次提交
  4. 21 9月, 2012 1 次提交
    • A
      xen/gndev: Xen backend support for paged out grant targets V4. · c571898f
      Andres Lagar-Cavilla 提交于
      Since Xen-4.2, hvm domains may have portions of their memory paged out. When a
      foreign domain (such as dom0) attempts to map these frames, the map will
      initially fail. The hypervisor returns a suitable errno, and kicks an
      asynchronous page-in operation carried out by a helper. The foreign domain is
      expected to retry the mapping operation until it eventually succeeds. The
      foreign domain is not put to sleep because itself could be the one running the
      pager assist (typical scenario for dom0).
      
      This patch adds support for this mechanism for backend drivers using grant
      mapping and copying operations. Specifically, this covers the blkback and
      gntdev drivers (which map foreign grants), and the netback driver (which copies
      foreign grants).
      
      * Add a retry method for grants that fail with GNTST_eagain (i.e. because the
        target foreign frame is paged out).
      * Insert hooks with appropriate wrappers in the aforementioned drivers.
      
      The retry loop is only invoked if the grant operation status is GNTST_eagain.
      It guarantees to leave a new status code different from GNTST_eagain. Any other
      status code results in identical code execution as before.
      
      The retry loop performs 256 attempts with increasing time intervals through a
      32 second period. It uses msleep to yield while waiting for the next retry.
      
      V2 after feedback from David Vrabel:
      * Explicit MAX_DELAY instead of wrap-around delay into zero
      * Abstract GNTST_eagain check into core grant table code for netback module.
      
      V3 after feedback from Ian Campbell:
      * Add placeholder in array of grant table error descriptions for unrelated
        error code we jump over.
      * Eliminate single map and retry macro in favor of a generic batch flavor.
      * Some renaming.
      * Bury most implementation in grant_table.c, cleaner interface.
      
      V4 rebased on top of sync of Xen grant table interface headers.
      Signed-off-by: NAndres Lagar-Cavilla <andres@lagarcavilla.org>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Fixed whitespace issues]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c571898f
  5. 19 9月, 2012 1 次提交
  6. 18 9月, 2012 2 次提交
  7. 14 9月, 2012 1 次提交
  8. 12 9月, 2012 1 次提交
    • S
      xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee
      Stefano Stabellini 提交于
      If the caller passes a valid kmap_op to m2p_add_override, we use
      kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
      part of the interface with Xen and if we are batching the hypercalls it
      might not have been written by the hypervisor yet. That means that later
      on Xen will write to it and we'll think that the original mfn is
      actually what Xen has written to it.
      
      Rather than "stealing" struct members from kmap_op, keep using
      page->index to store the original mfn and add another parameter to
      m2p_remove_override to get the corresponding kmap_op instead.
      It is now responsibility of the caller to keep track of which kmap_op
      corresponds to a particular page in the m2p_override (gntdev, the only
      user of this interface that passes a valid kmap_op, is already doing that).
      
      CC: stable@kernel.org
      Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2fc136ee
  9. 06 9月, 2012 1 次提交
  10. 23 8月, 2012 3 次提交
  11. 22 8月, 2012 2 次提交
  12. 17 8月, 2012 1 次提交
  13. 09 8月, 2012 1 次提交
  14. 14 9月, 2012 1 次提交
  15. 09 8月, 2012 1 次提交
  16. 14 9月, 2012 2 次提交
  17. 20 7月, 2012 4 次提交
    • O
      xen PVonHVM: move shared_info to MMIO before kexec · 00e37bdb
      Olaf Hering 提交于
      Currently kexec in a PVonHVM guest fails with a triple fault because the
      new kernel overwrites the shared info page. The exact failure depends on
      the size of the kernel image. This patch moves the pfn from RAM into
      MMIO space before the kexec boot.
      
      The pfn containing the shared_info is located somewhere in RAM. This
      will cause trouble if the current kernel is doing a kexec boot into a
      new kernel. The new kernel (and its startup code) can not know where the
      pfn is, so it can not reserve the page. The hypervisor will continue to
      update the pfn, and as a result memory corruption occours in the new
      kernel.
      
      One way to work around this issue is to allocate a page in the
      xen-platform pci device's BAR memory range. But pci init is done very
      late and the shared_info page is already in use very early to read the
      pvclock. So moving the pfn from RAM to MMIO is racy because some code
      paths on other vcpus could access the pfn during the small   window when
      the old pfn is moved to the new pfn. There is even a  small window were
      the old pfn is not backed by a mfn, and during that time all reads
      return -1.
      
      Because it is not known upfront where the MMIO region is located it can
      not be used right from the start in xen_hvm_init_shared_info.
      
      To minimise trouble the move of the pfn is done shortly before kexec.
      This does not eliminate the race because all vcpus are still online when
      the syscore_ops will be called. But hopefully there is no work pending
      at this point in time. Also the syscore_op is run last which reduces the
      risk further.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      00e37bdb
    • O
      xen/pv-on-hvm kexec: shutdown watches from old kernel · 254d1a3f
      Olaf Hering 提交于
      Add xs_reset_watches function to shutdown watches from old kernel after
      kexec boot.  The old kernel does not unregister all watches in the
      shutdown path.  They are still active, the double registration can not
      be detected by the new kernel.  When the watches fire, unexpected events
      will arrive and the xenwatch thread will crash (jumps to NULL).  An
      orderly reboot of a hvm guest will destroy the entire guest with all its
      resources (including the watches) before it is rebuilt from scratch, so
      the missing unregister is not an issue in that case.
      
      With this change the xenstored is instructed to wipe all active watches
      for the guest.  However, a patch for xenstored is required so that it
      accepts the XS_RESET_WATCHES request from a client (see changeset
      23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
      the registration of watches will fail and some features of a PVonHVM
      guest are not available. The guest is still able to boot, but repeated
      kexec boots will fail.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      254d1a3f
    • L
      xen/pcpu: Xen physical cpus online/offline sys interface · f65c9bb3
      Liu, Jinsong 提交于
      This patch provide Xen physical cpus online/offline sys interface.
      User can use it for their own purpose, like power saving:
      by offlining some cpus when light workload it save power greatly.
      
      Its basic workflow is, user online/offline cpu via sys interface,
      then hypercall xen to implement, after done xen inject virq back to dom0,
      and then dom0 sync cpu status.
      Signed-off-by: NJiang, Yunhong <yunhong.jiang@intel.com>
      Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f65c9bb3
    • L
      xen/mce: Add mcelog support for Xen platform · cef12ee5
      Liu, Jinsong 提交于
      When MCA error occurs, it would be handled by Xen hypervisor first,
      and then the error information would be sent to initial domain for logging.
      
      This patch gets error information from Xen hypervisor and convert
      Xen format error into Linux format mcelog. This logic is basically
      self-contained, not touching other kernel components.
      
      By using tools like mcelog tool users could read specific error information,
      like what they did under native Linux.
      
      To test follow directions outlined in Documentation/acpi/apei/einj.txt
      Acked-and-tested-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NKe, Liping <liping.ke@intel.com>
      Signed-off-by: NJiang, Yunhong <yunhong.jiang@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cef12ee5
  18. 22 5月, 2012 1 次提交
  19. 21 5月, 2012 1 次提交
  20. 08 5月, 2012 1 次提交
  21. 29 3月, 2012 1 次提交
  22. 28 3月, 2012 1 次提交
  23. 24 3月, 2012 1 次提交
  24. 21 3月, 2012 2 次提交
  25. 15 3月, 2012 1 次提交
    • K
      xen/acpi-processor: C and P-state driver that uploads said data to hypervisor. · 59a56802
      Konrad Rzeszutek Wilk 提交于
      This driver solves three problems:
       1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the
           hypervisor - aka P-states (cpufreq data).
       2). Upload the the Cx state information (cpuidle data).
       3). Inhibit CPU frequency scaling drivers from loading.
      
      The reason for wanting to solve 1) and 2) is such that the Xen hypervisor
      is the only one that knows the CPU usage of different guests and can
      make the proper decision of when to put CPUs and packages in proper states.
      Unfortunately the hypervisor has no support to parse ACPI DSDT tables, hence it
      needs help from the initial domain to provide this information. The reason
      for 3) is that we do not want the initial domain to change P-states while the
      hypervisor is doing it as well - it causes rather some funny cases of P-states
      transitions.
      
      For this to work, the driver parses the Power Management data and uploads said
      information to the Xen hypervisor. It also calls acpi_processor_notify_smm()
      to inhibit the other CPU frequency scaling drivers from being loaded.
      
      Everything revolves around the 'struct acpi_processor' structure which
      gets updated during the bootup cycle in different stages. At the startup, when
      the ACPI parser starts, the C-state information is processed (processor_idle)
      and saved in said structure as 'power' element. Later on, the CPU frequency
      scaling driver (powernow-k8 or acpi_cpufreq), would call the the
      acpi_processor_* (processor_perflib functions) to parse P-states information
      and populate in the said structure the 'performance' element.
      
      Since we do not want the CPU frequency scaling drivers from loading
      we have to call the acpi_processor_* functions to parse the P-states and
      call "acpi_processor_notify_smm" to stop them from loading.
      
      There is also one oddity in this driver which is that under Xen, the
      physical online CPU count can be different from the virtual online CPU count.
      Meaning that the macros 'for_[online|possible]_cpu' would process only
      up to virtual online CPU count. We on the other hand want to process
      the full amount of physical CPUs. For that, the driver checks if the ACPI IDs
      count is different from the APIC ID count - which can happen if the user
      choose to use dom0_max_vcpu argument. In such a case a backup of the PM
      structure is used and uploaded to the hypervisor.
      
      [v1-v2: Initial RFC implementations that were posted]
      [v3: Changed the name to passthru suggested by Pasi Kärkkäinen <pasik@iki.fi>]
      [v4: Added vCPU != pCPU support - aka dom0_max_vcpus support]
      [v5: Cleaned up the driver, fix bug under Athlon XP]
      [v6: Changed the driver to a CPU frequency governor]
      [v7: Jan Beulich <jbeulich@suse.com> suggestion to make it a cpufreq scaling driver
           made me rework it as driver that inhibits cpufreq scaling driver]
      [v8: Per Jan's review comments, fixed up the driver]
      [v9: Allow to continue even if acpi_processor_preregister_perf.. fails]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      59a56802
  26. 14 3月, 2012 1 次提交
  27. 11 3月, 2012 1 次提交
    • K
      xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. · 73c154c6
      Konrad Rzeszutek Wilk 提交于
      For the hypervisor to take advantage of the MWAIT support it needs
      to extract from the ACPI _CST the register address. But the
      hypervisor does not have the support to parse DSDT so it relies on
      the initial domain (dom0) to parse the ACPI Power Management information
      and push it up to the hypervisor. The pushing of the data is done
      by the processor_harveset_xen module which parses the information that
      the ACPI parser has graciously exposed in 'struct acpi_processor'.
      
      For the ACPI parser to also expose the Cx states for MWAIT, we need
      to expose the MWAIT capability (leaf 1). Furthermore we also need to
      expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
      function.
      
      The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
      operations, but it can't do it since it needs to be backwards compatible.
      Instead we choose to use the native CPUID to figure out if the MWAIT
      capability exists and use the XEN_SET_PDC query hypercall to figure out
      if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
      
      Note: The XEN_SET_PDC query was implemented in c/s 23783:
      "ACPI: add _PDC input override mechanism".
      
      With this in place, instead of
       C3 ACPI IOPORT 415
      we get now
       C3:ACPI FFH INTEL MWAIT 0x20
      
      Note: The cpu_idle which would be calling the mwait variants for idling
      never gets set b/c we set the default pm_idle to be the hypercall variant.
      Acked-by: NJan Beulich <JBeulich@suse.com>
      [v2: Fix missing header file include and #ifdef]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73c154c6
  28. 13 1月, 2012 1 次提交
  29. 05 1月, 2012 2 次提交
    • I
      xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX. · 9e7860ce
      Ian Campbell 提交于
      Haogang Chen found out that:
      
       There is a potential integer overflow in process_msg() that could result
       in cross-domain attack.
      
       	body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH);
      
       When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent
       call to xb_read() would write to a zero-length buffer.
      
       The other end of this connection is always the xenstore backend daemon
       so there is no guest (malicious or otherwise) which can do this. The
       xenstore daemon is a trusted component in the system.
      
       However this seem like a reasonable robustness improvement so we should
       have it.
      
      And Ian when read the API docs found that:
              The payload length (len field of the header) is limited to 4096
              (XENSTORE_PAYLOAD_MAX) in both directions.  If a client exceeds the
              limit, its xenstored connection will be immediately killed by
              xenstored, which is usually catastrophic from the client's point of
              view.  Clients (particularly domains, which cannot just reconnect)
              should avoid this.
      
      so this patch checks against that instead.
      
      This also avoids a potential integer overflow pointed out by Haogang Chen.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Haogang Chen <haogangchen@gmail.com>
      CC: stable@kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9e7860ce
    • J
      Xen: consolidate and simplify struct xenbus_driver instantiation · 73db144b
      Jan Beulich 提交于
      The 'name', 'owner', and 'mod_name' members are redundant with the
      identically named fields in the 'driver' sub-structure. Rather than
      switching each instance to specify these fields explicitly, introduce
      a macro to simplify this.
      
      Eliminate further redundancy by allowing the drvname argument to
      DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from
      the ID table will be used for .driver.name).
      
      Also eliminate the questionable xenbus_register_{back,front}end()
      wrappers - their sole remaining purpose was the checking of the
      'owner' field, proper setting of which shouldn't be an issue anymore
      when the macro gets used.
      
      v2: Restore DRV_NAME for the driver name in xen-pciback.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73db144b