1. 13 7月, 2017 1 次提交
  2. 03 7月, 2017 1 次提交
  3. 30 6月, 2017 1 次提交
  4. 25 6月, 2017 1 次提交
    • J
      xen: allocate page for shared info page from low memory · a5d5f328
      Juergen Gross 提交于
      In a HVM guest the kernel allocates the page for mapping the shared
      info structure via extend_brk() today. This will lead to a drop of
      performance as the underlying EPT entry will have to be split up into
      4kB entries as the single shared info page is located in hypervisor
      memory.
      
      The issue has been detected by using the libmicro munmap test:
      unmapping 8kB of memory was faster by nearly a factor of two when no
      pv interfaces were active in the HVM guest.
      
      So instead of taking a page from memory which might be mapped via
      large EPT entries use a page which is already mapped via a 4kB EPT
      entry: we can take a page from the first 1MB of memory as the video
      memory at 640kB disallows using larger EPT entries.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      a5d5f328
  5. 23 6月, 2017 2 次提交
  6. 20 6月, 2017 1 次提交
  7. 13 6月, 2017 7 次提交
    • A
      xen/vcpu: Handle xen_vcpu_setup() failure at boot · ae039001
      Ankur Arora 提交于
      On PVH, PVHVM, at failure in the VCPUOP_register_vcpu_info hypercall
      we limit the number of cpus to to MAX_VIRT_CPUS. However, if this
      failure had occurred for a cpu beyond MAX_VIRT_CPUS, we continue
      to function with > MAX_VIRT_CPUS.
      
      This leads to problems at the next save/restore cycle when there
      are > MAX_VIRT_CPUS threads going into stop_machine() but coming
      back up there's valid state for only the first MAX_VIRT_CPUS.
      
      This patch pulls the excess CPUs down via cpu_down().
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ae039001
    • A
      xen/vcpu: Handle xen_vcpu_setup() failure in hotplug · c9b5d98b
      Ankur Arora 提交于
      The hypercall VCPUOP_register_vcpu_info can fail. This failure is
      handled by making per_cpu(xen_vcpu, cpu) point to its shared_info
      slot and those without one (cpu >= MAX_VIRT_CPUS) be NULL.
      
      For PVH/PVHVM, this is not enough, because we also need to pull
      these VCPUs out of circulation.
      
      Fix for PVH/PVHVM: on registration failure in the cpuhp prepare
      callback (xen_cpu_up_prepare_hvm()), return an error to the cpuhp
      state-machine so it can fail the CPU init.
      
      Fix for PV: the registration happens before smp_init(), so, in the
      failure case we clamp setup_max_cpus and limit the number of VCPUs
      that smp_init() will bring-up to MAX_VIRT_CPUS.
      This is functionally correct but it makes the code a bit simpler
      if we get rid of this explicit clamping: for VCPUs that don't have
      valid xen_vcpu, fail the CPU init in the cpuhp prepare callback
      (xen_cpu_up_prepare_pv()).
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      c9b5d98b
    • A
      xen/pv: Fix OOPS on restore for a PV, !SMP domain · 0e4d5837
      Ankur Arora 提交于
      If CONFIG_SMP is disabled, xen_setup_vcpu_info_placement() is called from
      xen_setup_shared_info(). This is fine as far as boot goes, but it means
      that we also call it in the restore path. This results in an OOPS
      because we assign to pv_mmu_ops.read_cr2 which is __ro_after_init.
      
      Also, though less problematically, this means we call xen_vcpu_setup()
      twice at restore -- once from the vcpu info placement call and the
      second time from xen_vcpu_restore().
      
      Fix by calling xen_setup_vcpu_info_placement() at boot only.
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      0e4d5837
    • A
      xen/pvh*: Support > 32 VCPUs at domain restore · 0b64ffb8
      Ankur Arora 提交于
      When Xen restores a PVHVM or PVH guest, its shared_info only holds
      up to 32 CPUs. The hypercall VCPUOP_register_vcpu_info allows
      us to setup per-page areas for VCPUs. This means we can boot
      PVH* guests with more than 32 VCPUs. During restore the per-cpu
      structure is allocated freshly by the hypervisor (vcpu_info_mfn is
      set to INVALID_MFN) so that the newly restored guest can make a
      VCPUOP_register_vcpu_info hypercall.
      
      However, we end up triggering this condition in Xen:
      /* Run this command on yourself or on other offline VCPUS. */
       if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
      
      which means we are unable to setup the per-cpu VCPU structures
      for running VCPUS. The Linux PV code paths makes this work by
      iterating over cpu_possible in xen_vcpu_restore() with:
      
       1) is target CPU up (VCPUOP_is_up hypercall?)
       2) if yes, then VCPUOP_down to pause it
       3) VCPUOP_register_vcpu_info
       4) if it was down, then VCPUOP_up to bring it back up
      
      With Xen commit 192df6f9122d ("xen/x86: allow HVM guests to use
      hypercalls to bring up vCPUs") this is available for non-PV guests.
      As such first check if VCPUOP_is_up is actually possible before
      trying this dance.
      
      As most of this dance code is done already in xen_vcpu_restore()
      let's make it callable on PV, PVH and PVHVM.
      Based-on-patch-by: NKonrad Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      0b64ffb8
    • A
      xen/vcpu: Simplify xen_vcpu related code · ad73fd59
      Ankur Arora 提交于
      Largely mechanical changes to aid unification of xen_vcpu_restore()
      logic for PV, PVH and PVHVM.
      
      xen_vcpu_setup(): the only change in logic is that clamp_max_cpus()
      is now handled inside the "if (!xen_have_vcpu_info_placement)" block.
      
      xen_vcpu_restore(): code movement from enlighten_pv.c to enlighten.c.
      
      xen_vcpu_info_reset(): pulls together all the code where xen_vcpu
      is set to default.
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ad73fd59
    • K
      x86/boot/64: Rename init_level4_pgt and early_level4_pgt · 65ade2f8
      Kirill A. Shutemov 提交于
      With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables.
      
      Let's give these variable more generic names: init_top_pgt and
      early_top_pgt.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65ade2f8
    • A
      x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() · 6c690ee1
      Andy Lutomirski 提交于
      The kernel has several code paths that read CR3.  Most of them assume that
      CR3 contains the PGD's physical address, whereas some of them awkwardly
      use PHYSICAL_PAGE_MASK to mask off low bits.
      
      Add explicit mask macros for CR3 and convert all of the CR3 readers.
      This will keep them from breaking when PCID is enabled.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c690ee1
  8. 05 6月, 2017 2 次提交
    • A
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski 提交于
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3d28ebce
    • A
      x86/mm: Pass flush_tlb_info to flush_tlb_others() etc · a2055abe
      Andy Lutomirski 提交于
      Rather than passing all the contents of flush_tlb_info to
      flush_tlb_others(), pass a pointer to the structure directly. For
      consistency, this also removes the unnecessary cpu parameter from
      uv_flush_tlb_others() to make its signature match the other
      *flush_tlb_others() functions.
      
      This serves two purposes:
      
       - It will dramatically simplify future patches that change struct
         flush_tlb_info, which I'm planning to do.
      
       - struct flush_tlb_info is an adequate description of what to do
         for a local flush, too, so by reusing it we can remove duplicated
         code between local and remove flushes in a future patch.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      [ Fix build warning. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a2055abe
  9. 19 5月, 2017 2 次提交
  10. 11 5月, 2017 2 次提交
  11. 05 5月, 2017 2 次提交
  12. 03 5月, 2017 1 次提交
  13. 02 5月, 2017 17 次提交
    • J
      xen: Implement EFI reset_system callback · e371fd76
      Julien Grall 提交于
      When rebooting DOM0 with ACPI on ARM64, the kernel is crashing with the stack
      trace [1].
      
      This is happening because when EFI runtimes are enabled, the reset code
      (see machine_restart) will first try to use EFI restart method.
      
      However, the EFI restart code is expecting the reset_system callback to
      be always set. This is not the case for Xen and will lead to crash.
      
      The EFI restart helper is used in multiple places and some of them don't
      not have fallback (see machine_power_off). So implement reset_system
      callback as a call to xen_reboot when using EFI Xen.
      
      [   36.999270] reboot: Restarting system
      [   37.002921] Internal error: Attempting to execute userspace memory: 86000004 [#1] PREEMPT SMP
      [   37.011460] Modules linked in:
      [   37.014598] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.11.0-rc1-00003-g1e248b60a39b-dirty #506
      [   37.023903] Hardware name: (null) (DT)
      [   37.027734] task: ffff800902068000 task.stack: ffff800902064000
      [   37.033739] PC is at 0x0
      [   37.036359] LR is at efi_reboot+0x94/0xd0
      [   37.040438] pc : [<0000000000000000>] lr : [<ffff00000880f2c4>] pstate: 404001c5
      [   37.047920] sp : ffff800902067cf0
      [   37.051314] x29: ffff800902067cf0 x28: ffff800902068000
      [   37.056709] x27: ffff000008992000 x26: 000000000000008e
      [   37.062104] x25: 0000000000000123 x24: 0000000000000015
      [   37.067499] x23: 0000000000000000 x22: ffff000008e6e250
      [   37.072894] x21: ffff000008e6e000 x20: 0000000000000000
      [   37.078289] x19: ffff000008e5d4c8 x18: 0000000000000010
      [   37.083684] x17: 0000ffffa7c27470 x16: 00000000deadbeef
      [   37.089079] x15: 0000000000000006 x14: ffff000088f42bef
      [   37.094474] x13: ffff000008f42bfd x12: ffff000008e706c0
      [   37.099870] x11: ffff000008e70000 x10: 0000000005f5e0ff
      [   37.105265] x9 : ffff800902067a50 x8 : 6974726174736552
      [   37.110660] x7 : ffff000008cc6fb8 x6 : ffff000008cc6fb0
      [   37.116055] x5 : ffff000008c97dd8 x4 : 0000000000000000
      [   37.121453] x3 : 0000000000000000 x2 : 0000000000000000
      [   37.126845] x1 : 0000000000000000 x0 : 0000000000000000
      [   37.132239]
      [   37.133808] Process systemd-shutdow (pid: 1, stack limit = 0xffff800902064000)
      [   37.141118] Stack: (0xffff800902067cf0 to 0xffff800902068000)
      [   37.146949] 7ce0:                                   ffff800902067d40 ffff000008085334
      [   37.154869] 7d00: 0000000000000000 ffff000008f3b000 ffff800902067d40 ffff0000080852e0
      [   37.162787] 7d20: ffff000008cc6fb0 ffff000008cc6fb8 ffff000008c7f580 ffff000008c97dd8
      [   37.170706] 7d40: ffff800902067d60 ffff0000080e2c2c 0000000000000000 0000000001234567
      [   37.178624] 7d60: ffff800902067d80 ffff0000080e2ee8 0000000000000000 ffff0000080e2df4
      [   37.186544] 7d80: 0000000000000000 ffff0000080830f0 0000000000000000 00008008ff1c1000
      [   37.194462] 7da0: ffffffffffffffff 0000ffffa7c4b1cc 0000000000000000 0000000000000024
      [   37.202380] 7dc0: ffff800902067dd0 0000000000000005 0000fffff24743c8 0000000000000004
      [   37.210299] 7de0: 0000fffff2475f03 0000000000000010 0000fffff2474418 0000000000000005
      [   37.218218] 7e00: 0000fffff2474578 000000000000000a 0000aaaad6b722c0 0000000000000001
      [   37.226136] 7e20: 0000000000000123 0000000000000038 ffff800902067e50 ffff0000081e7294
      [   37.234055] 7e40: ffff800902067e60 ffff0000081e935c ffff800902067e60 ffff0000081e9388
      [   37.241973] 7e60: ffff800902067eb0 ffff0000081ea388 0000000000000000 00008008ff1c1000
      [   37.249892] 7e80: ffffffffffffffff 0000ffffa7c4a79c 0000000000000000 ffff000000020000
      [   37.257810] 7ea0: 0000010000000004 0000000000000000 0000000000000000 ffff0000080830f0
      [   37.265729] 7ec0: fffffffffee1dead 0000000028121969 0000000001234567 0000000000000000
      [   37.273651] 7ee0: ffffffffffffffff 8080000000800000 0000800000008080 feffa9a9d4ff2d66
      [   37.281567] 7f00: 000000000000008e feffa9a9d5b60e0f 7f7fffffffff7f7f 0101010101010101
      [   37.289485] 7f20: 0000000000000010 0000000000000008 000000000000003a 0000ffffa7ccf588
      [   37.297404] 7f40: 0000aaaad6b87d00 0000ffffa7c4b1b0 0000fffff2474be0 0000aaaad6b88000
      [   37.305326] 7f60: 0000fffff2474fb0 0000000001234567 0000000000000000 0000000000000000
      [   37.313240] 7f80: 0000000000000000 0000000000000001 0000aaaad6b70d4d 0000000000000000
      [   37.321159] 7fa0: 0000000000000001 0000fffff2474ea0 0000aaaad6b5e2e0 0000fffff2474e80
      [   37.329078] 7fc0: 0000ffffa7c4b1cc 0000000000000000 fffffffffee1dead 000000000000008e
      [   37.336997] 7fe0: 0000000000000000 0000000000000000 9ce839cffee77eab fafdbf9f7ed57f2f
      [   37.344911] Call trace:
      [   37.347437] Exception stack(0xffff800902067b20 to 0xffff800902067c50)
      [   37.353970] 7b20: ffff000008e5d4c8 0001000000000000 0000000080f82000 0000000000000000
      [   37.361883] 7b40: ffff800902067b60 ffff000008e17000 ffff000008f44c68 00000001081081b4
      [   37.369802] 7b60: ffff800902067bf0 ffff000008108478 0000000000000000 ffff000008c235b0
      [   37.377721] 7b80: ffff800902067ce0 0000000000000000 0000000000000000 0000000000000015
      [   37.385643] 7ba0: 0000000000000123 000000000000008e ffff000008992000 ffff800902068000
      [   37.393557] 7bc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [   37.401477] 7be0: 0000000000000000 ffff000008c97dd8 ffff000008cc6fb0 ffff000008cc6fb8
      [   37.409396] 7c00: 6974726174736552 ffff800902067a50 0000000005f5e0ff ffff000008e70000
      [   37.417318] 7c20: ffff000008e706c0 ffff000008f42bfd ffff000088f42bef 0000000000000006
      [   37.425234] 7c40: 00000000deadbeef 0000ffffa7c27470
      [   37.430190] [<          (null)>]           (null)
      [   37.434982] [<ffff000008085334>] machine_restart+0x6c/0x70
      [   37.440550] [<ffff0000080e2c2c>] kernel_restart+0x6c/0x78
      [   37.446030] [<ffff0000080e2ee8>] SyS_reboot+0x130/0x228
      [   37.451337] [<ffff0000080830f0>] el0_svc_naked+0x24/0x28
      [   37.456737] Code: bad PC value
      [   37.459891] ---[ end trace 76e2fc17e050aecd ]---
      Signed-off-by: NJulien Grall <julien.grall@arm.com>
      
      --
      
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      
      The x86 code has theoritically a similar issue, altought EFI does not
      seem to be the preferred method. I have only built test it on x86.
      
      This should also probably be fixed in stable tree.
      
          Changes in v2:
              - Implement xen_efi_reset_system using xen_reboot
              - Move xen_efi_reset_system in drivers/xen/efi.c
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      e371fd76
    • J
      xen: Export xen_reboot · 5d9404e1
      Julien Grall 提交于
      The helper xen_reboot will be called by the EFI code in a later patch.
      
      Note that the ARM version does not yet exist and will be added in a
      later patch too.
      Signed-off-by: NJulien Grall <julien.grall@arm.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      5d9404e1
    • B
      xen/x86: Call xen_smp_intr_init_pv() on BSP · f31b9692
      Boris Ostrovsky 提交于
      Recent code rework that split handling ov PV, HVM and PVH guests into
      separate files missed calling xen_smp_intr_init_pv() on CPU0.
      
      Add this call.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      f31b9692
    • B
      xen: Revert commits da72ff5b and 72a9b186 · 84d582d2
      Boris Ostrovsky 提交于
      Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741)
      established that commit 72a9b186 ("xen: Remove event channel
      notification through Xen PCI platform device") (and thus commit
      da72ff5b ("partially revert "xen: Remove event channel
      notification through Xen PCI platform device"")) are unnecessary and,
      in fact, prevent HVM guests from booting on Xen releases prior to 4.0
      
      Therefore we revert both of those commits.
      
      The summary of that discussion is below:
      
        Here is the brief summary of the current situation:
      
        Before the offending commit (72a9b186):
      
        1) INTx does not work because of the reset_watches path.
        2) The reset_watches path is only taken if you have Xen > 4.0
        3) The Linux Kernel by default will use vector inject if the hypervisor
           support. So even INTx does not work no body running the kernel with
           Xen > 4.0 would notice. Unless he explicitly disabled this feature
           either in the kernel or in Xen (and this can only be disabled by
           modifying the code, not user-supported way to do it).
      
        After the offending commit (+ partial revert):
      
        1) INTx is no longer support for HVM (only for PV guests).
        2) Any HVM guest The kernel will not boot on Xen < 4.0 which does
           not have vector injection support. Since the only other mode
           supported is INTx which.
      
        So based on this summary, I think before commit (72a9b186) we were
        in much better position from a user point of view.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      84d582d2
    • B
      xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams() · 5f6a1614
      Boris Ostrovsky 提交于
      e820 map is updated with information from the zeropage (i.e. pvh_bootparams)
      by default_machine_specific_memory_setup(). With the way things are done
      now,  we end up with a duplicated e820 map.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      5f6a1614
    • J
      x86/xen: use capabilities instead of fake cpuid values for xsave · 6807cf65
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the xsave feature availability is
      indicated by special casing the related cpuid leaf.
      
      Instead of delivering fake cpuid values set or clear the cpu
      capability bits for xsave instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      6807cf65
    • J
      x86/xen: use capabilities instead of fake cpuid values for x2apic · e657fccb
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the x2apic feature is indicated as not
      being present by special casing the related cpuid leaf.
      
      Instead of delivering fake cpuid values clear the cpu capability bit
      for x2apic instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      e657fccb
    • J
      x86/xen: use capabilities instead of fake cpuid values for mwait · ea01598b
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the mwait feature is indicated to be
      present or not by special casing the related cpuid leaf.
      
      Instead of delivering fake cpuid values use the cpu capability bit
      for mwait instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ea01598b
    • J
      x86/xen: use capabilities instead of fake cpuid values for acpi · b778d6bf
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the acpi feature is indicated as not
      being present by special casing the related cpuid leaf in case we
      are not the initial domain.
      
      Instead of delivering fake cpuid values clear the cpu capability bit
      for acpi instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      b778d6bf
    • J
      x86/xen: use capabilities instead of fake cpuid values for acc · aa107156
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the acc feature (thermal monitoring)
      is indicated as not being present by special casing the related
      cpuid leaf.
      
      Instead of delivering fake cpuid values clear the cpu capability bit
      for acc instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      aa107156
    • J
      x86/xen: use capabilities instead of fake cpuid values for mtrr · 88f3256f
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the mtrr feature is indicated as not
      being present by special casing the related cpuid leaf.
      
      Instead of delivering fake cpuid values clear the cpu capability bit
      for mtrr instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      88f3256f
    • J
      x86/xen: use capabilities instead of fake cpuid values for aperf · fd9145fd
      Juergen Gross 提交于
      When running as pv domain xen_cpuid() is being used instead of
      native_cpuid(). In xen_cpuid() the aperf/mperf feature is indicated
      as not being present by special casing the related cpuid leaf.
      
      Instead of delivering fake cpuid values clear the cpu capability bit
      for aperf/mperf instead.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      fd9145fd
    • J
      x86/xen: don't indicate DCA support in pv domains · 3ee99df3
      Juergen Gross 提交于
      Xen doesn't support DCA (direct cache access) for pv domains. Clear
      the corresponding capability indicator.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      3ee99df3
    • J
      xen: set cpu capabilities from xen_start_kernel() · 0808e80c
      Juergen Gross 提交于
      There is no need to set the same capabilities for each cpu
      individually. This can easily be done for all cpus when starting the
      kernel.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      0808e80c
    • J
      xen,kdump: handle pv domain in paddr_vmcoreinfo_note() · 29985b09
      Juergen Gross 提交于
      For kdump to work correctly it needs the physical address of
      vmcoreinfo_note. When running as dom0 this means the virtual address
      has to be translated to the related machine address.
      
      paddr_vmcoreinfo_note() is meant to do the translation via
      __pa_symbol() only, but being attributed "weak" it can be replaced
      easily in Xen case.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NPetr Tesarik <ptesarik@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      29985b09
    • J
      x86/xen: remove unused static function from smp_pv.c · ab1570a4
      Juergen Gross 提交于
      xen_call_function_interrupt() isn't used in smp_pv.c. Remove it.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      ab1570a4
    • V
      x86/xen: rename some PV-only functions in smp_pv.c · 8cb6de39
      Vitaly Kuznetsov 提交于
      After code split between PV and HVM some functions in xen_smp_ops have
      xen_pv_ prefix and some only xen_ which makes them look like they're
      common for both PV and HVM while they're not. Rename all the rest to
      have xen_pv_ prefix.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      8cb6de39