1. 20 8月, 2015 3 次提交
  2. 31 7月, 2015 1 次提交
    • A
      x86/xen: Probe target addresses in set_aliased_prot() before the hypercall · aa1acff3
      Andy Lutomirski 提交于
      The update_va_mapping hypercall can fail if the VA isn't present
      in the guest's page tables.  Under certain loads, this can
      result in an OOPS when the target address is in unpopulated vmap
      space.
      
      While we're at it, add comments to help explain what's going on.
      
      This isn't a great long-term fix.  This code should probably be
      changed to use something like set_memory_ro.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <dvrabel@cantab.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org <security@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aa1acff3
  3. 07 6月, 2015 1 次提交
  4. 19 5月, 2015 2 次提交
    • I
      x86/fpu: Simplify fpu__cpu_init() · 21c4cd10
      Ingo Molnar 提交于
      After the latest round of cleanups, fpu__cpu_init() has become
      a simple call to fpu__init_cpu().
      
      Rename fpu__init_cpu() to fpu__cpu_init() and remove the
      extra layer.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21c4cd10
    • I
      x86/fpu: Rename fpu_init() to fpu__cpu_init() · 3a9c4b0d
      Ingo Molnar 提交于
      fpu_init() is a bit of a misnomer in that it (falsely) creates the
      impression that it's related to the (old) fpu_finit() function,
      which initializes FPU ctx state.
      
      Rename it to fpu__cpu_init() to make its boot time initialization
      clear, and to move it to the fpu__*() namespace.
      
      Also fix and extend its comment block to point out that it's
      called not only on the boot CPU, but on secondary CPUs as well.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3a9c4b0d
  5. 06 5月, 2015 1 次提交
  6. 22 4月, 2015 1 次提交
  7. 16 3月, 2015 1 次提交
  8. 06 3月, 2015 1 次提交
    • A
      x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu · 8ef46a67
      Andy Lutomirski 提交于
      We currently store references to the top of the kernel stack in
      multiple places: kernel_stack (with an offset) and
      init_tss.x86_tss.sp0 (no offset).  The latter is defined by
      hardware and is a clean canonical way to find the top of the
      stack.  Add an accessor so we can start using it.
      
      This needs minor paravirt tweaks.  On native, sp0 defines the
      top of the kernel stack and is therefore always correct.  On Xen
      and lguest, the hypervisor tracks the top of the stack, but we
      want to start reading sp0 in the kernel.  Fixing this is simple:
      just update our local copy of sp0 as well as the hypervisor's
      copy on task switches.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8ef46a67
  9. 24 2月, 2015 2 次提交
  10. 04 2月, 2015 1 次提交
  11. 13 1月, 2015 1 次提交
    • J
      x86/xen: properly retrieve NMI reason · f221b04f
      Jan Beulich 提交于
      Using the native code here can't work properly, as the hypervisor would
      normally have cleared the two reason bits by the time Dom0 gets to see
      the NMI (if passed to it at all). There's a shared info field for this,
      and there's an existing hook to use - just fit the two together. This
      is particularly relevant so that NMIs intended to be handled by APEI /
      GHES actually make it to the respective handler.
      
      Note that the hook can (and should) be used irrespective of whether
      being in Dom0, as accessing port 0x61 in a DomU would be even worse,
      while the shared info field would just hold zero all the time. Note
      further that hardware NMI handling for PVH doesn't currently work
      anyway due to missing code in the hypervisor (but it is expected to
      work the native rather than the PV way).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      f221b04f
  12. 16 11月, 2014 1 次提交
  13. 23 10月, 2014 1 次提交
    • J
      x86/xen: delay construction of mfn_list_list · 2c185687
      Juergen Gross 提交于
      The 3 level p2m tree for the Xen tools is constructed very early at
      boot by calling xen_build_mfn_list_list(). Memory needed for this tree
      is allocated via extend_brk().
      
      As this tree (other than the kernel internal p2m tree) is only needed
      for domain save/restore, live migration and crash dump analysis it
      doesn't matter whether it is constructed very early or just some
      milliseconds later when memory allocation is possible by other means.
      
      This patch moves the call of xen_build_mfn_list_list() just after
      calling xen_pagetable_p2m_copy() simplifying this function, too, as it
      doesn't have to bother with two parallel trees now. The same applies
      for some other internal functions.
      
      While simplifying code, make early_can_reuse_p2m_middle() static and
      drop the unused second parameter. p2m_mid_identity_mfn can be removed
      as well, it isn't used either.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2c185687
  14. 06 10月, 2014 1 次提交
  15. 03 10月, 2014 1 次提交
  16. 23 9月, 2014 1 次提交
    • D
      x86: remove the Xen-specific _PAGE_IOMAP PTE flag · f955371c
      David Vrabel 提交于
      The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
      that were used to map I/O regions that are 1:1 in the p2m.  This
      allowed Xen to obtain the correct PFN when converting the MFNs read
      from a PTE back to their PFN.
      
      Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
      returns the correct PFN by using a combination of the m2p and p2m to
      determine if an MFN corresponds to a 1:1 mapping in the the p2m.
      
      Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
      future uses of the PTE flag.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      f955371c
  17. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  18. 19 7月, 2014 2 次提交
    • D
      arch/x86/xen: Silence compiler warnings · c7341d6a
      Daniel Kiper 提交于
      Compiler complains in the following way when x86 32-bit kernel
      with Xen support is build:
      
        CC      arch/x86/xen/enlighten.o
      arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’:
      arch/x86/xen/enlighten.c:1726:3: warning: right shift count >= width of type [enabled by default]
      
      Such line contains following EFI initialization code:
      
      boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32);
      
      There is no issue if x86 64-bit kernel is build. However, 32-bit case
      generate warning (even if that code will not be executed because Xen
      does not work on 32-bit EFI platforms) due to __pa() returning unsigned long
      type which has 32-bits width. So move whole EFI initialization stuff
      to separate function and build it conditionally to avoid above mentioned
      warning on x86 32-bit architecture.
      Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      c7341d6a
    • D
      xen: Put EFI machinery in place · be81c8a1
      Daniel Kiper 提交于
      This patch enables EFI usage under Xen dom0. Standard EFI Linux
      Kernel infrastructure cannot be used because it requires direct
      access to EFI data and code. However, in dom0 case it is not possible
      because above mentioned EFI stuff is fully owned and controlled
      by Xen hypervisor. In this case all calls from dom0 to EFI must
      be requested via special hypercall which in turn executes relevant
      EFI code in behalf of dom0.
      
      When dom0 kernel boots it checks for EFI availability on a machine.
      If it is detected then artificial EFI system table is filled.
      Native EFI callas are replaced by functions which mimics them
      by calling relevant hypercall. Later pointer to EFI system table
      is passed to standard EFI machinery and it continues EFI subsystem
      initialization taking into account that there is no direct access
      to EFI boot services, runtime, tables, structures, etc. After that
      system runs as usual.
      
      This patch is based on Jan Beulich and Tang Liang work.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NTang Liang <liang.tang@oracle.com>
      Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      be81c8a1
  19. 15 7月, 2014 1 次提交
    • K
      xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests. · 8d693b91
      Konrad Rzeszutek Wilk 提交于
      By default when CONFIG_XEN and CONFIG_XEN_PVHVM kernels are
      run, they will enable the PV extensions (drivers, interrupts, timers,
      etc) - which is the best option for the majority of use cases.
      
      However, in some cases (kexec not fully working, benchmarking)
      we want to disable Xen PV extensions. As such introduce the
      'xen_nopv' parameter that will do it.
      
      This parameter is intended only for HVM guests as the Xen PV
      guests MUST boot with PV extensions. However, even if you use
      'xen_nopv' on Xen PV guests it will be ignored.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      ---
      [v2: s/off/xen_nopv/ per Boris Ostrovsky recommendation.]
      [v3: Add Reviewed-by]
      [v4: Clarify that this is only for HVM guests]
      8d693b91
  20. 05 6月, 2014 1 次提交
  21. 15 5月, 2014 1 次提交
  22. 06 5月, 2014 1 次提交
  23. 04 2月, 2014 1 次提交
  24. 22 1月, 2014 1 次提交
    • R
      xen/pvh: Set X86_CR0_WP and others in CR0 (v2) · c9f6e997
      Roger Pau Monne 提交于
      otherwise we will get for some user-space applications
      that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
      end up hitting an assert in glibc manifested by:
      
      general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in
      libc-2.13.so[7f807209e000+180000]
      
      This is due to the nature of said operations which sets and clears
      the PID.  "In the successful one I can see that the page table of
      the parent process has been updated successfully to use a
      different physical page, so the write of the tid on
      that page only affects the child...
      
      On the other hand, in the failed case, the write seems to happen before
      the copy of the original page is done, so both the parent and the child
      end up with the same value (because the parent copies the page after
      the write of the child tid has already happened)."
      (Roger's analysis). The nature of this is due to the Xen's commit
      of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5
      "x86/pvh: set only minimal cr0 and cr4 flags in order to use paging"
      the CR0_WP was removed so COW features of the Linux kernel were not
      operating properly.
      
      While doing that also update the rest of the CR0 flags to be inline
      with what a baremetal Linux kernel would set them to.
      
      In 'secondary_startup_64' (baremetal Linux) sets:
      
      X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP |
      X86_CR0_AM | X86_CR0_PG
      
      The hypervisor for HVM type guests (which PVH is a bit) sets:
      X86_CR0_PE | X86_CR0_ET | X86_CR0_TS
      For PVH it specifically sets:
      X86_CR0_PG
      
      Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE  |
      X86_CR0_WP | X86_CR0_AM to have full parity.
      Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com>
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Took out the cr4 writes to be a seperate patch]
      [v2: 0-DAY kernel found xen_setup_gdt to be missing a static]
      c9f6e997
  25. 07 1月, 2014 1 次提交
  26. 06 1月, 2014 5 次提交
    • M
      xen/pvh: Piggyback on PVHVM for event channels (v2) · 2771374d
      Mukesh Rathor 提交于
      PVH is a PV guest with a twist - there are certain things
      that work in it like HVM and some like PV. There is
      a similar mode - PVHVM where we run in HVM mode with
      PV code enabled - and this patch explores that.
      
      The most notable PV interfaces are the XenBus and event channels.
      
      We will piggyback on how the event channel mechanism is
      used in PVHVM - that is we want the normal native IRQ mechanism
      and we will install a vector (hvm callback) for which we
      will call the event channel mechanism.
      
      This means that from a pvops perspective, we can use
      native_irq_ops instead of the Xen PV specific. Albeit in the
      future we could support pirq_eoi_map. But that is
      a feature request that can be shared with PVHVM.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      2771374d
    • M
      xen/pvh: Secondary VCPU bringup (non-bootup CPUs) · 5840c84b
      Mukesh Rathor 提交于
      The VCPU bringup protocol follows the PV with certain twists.
      From xen/include/public/arch-x86/xen.h:
      
      Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
      for HVM and PVH guests, not all information in this structure is updated:
      
       - For HVM guests, the structures read include: fpu_ctxt (if
       VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
      
       - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
       set cr3. All other fields not used should be set to 0.
      
      This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
      a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
      can load per-cpu data-structures. It has no effect on the VCPU0.
      
      We also piggyback on the %rdi register to pass in the CPU number - so
      that when we bootup a new CPU, the cpu_bringup_and_idle will have
      passed as the first parameter the CPU number (via %rdi for 64-bit).
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5840c84b
    • M
      xen/pvh: Load GDT/GS in early PV bootup code for BSP. · 8d656bbe
      Mukesh Rathor 提交于
      During early bootup we start life using the Xen provided
      GDT, which means that we are running with %cs segment set
      to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).
      
      But for PVH we want to be use HVM type mechanism for
      segment operations. As such we need to switch to the HVM
      one and also reload ourselves with the __KERNEL_CS:eip
      to run in the proper GDT and segment.
      
      For HVM this is usually done in 'secondary_startup_64' in
      (head_64.S) but since we are not taking that bootup
      path (we start in PV - xen_start_kernel) we need to do
      that in the early PV bootup paths.
      
      For good measure we also zero out the %fs, %ds, and %es
      (not strictly needed as Xen has already cleared them
      for us). The %gs is loaded by 'switch_to_new_gdt'.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      8d656bbe
    • K
      xen/pvh: Don't setup P2M tree. · 696fd7c5
      Konrad Rzeszutek Wilk 提交于
      P2M is not available for PVH. Fortunatly for us the
      P2M code already has mostly the support for auto-xlat guest thanks to
      commit 3d24bbd7
      "grant-table: call set_phys_to_machine after mapping grant refs"
      which: "
      introduces set_phys_to_machine calls for auto_translated guests
      (even on x86) in gnttab_map_refs and gnttab_unmap_refs.
      translated by swiotlb-xen... " so we don't need to muck much.
      
      with above mentioned "commit you'll get set_phys_to_machine calls
      from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
      anything with them " (Stefano Stabellini) which is OK - we want
      them to be NOPs.
      
      This is because we assume that an "IOMMU is always present on the
      plaform and Xen is going to make the appropriate IOMMU pagetable
      changes in the hypercall implementation of GNTTABOP_map_grant_ref
      and GNTTABOP_unmap_grant_ref, then eveything should be transparent
      from PVH priviligied point of view and DMA transfers involving
      foreign pages keep working with no issues[sp]
      
      Otherwise we would need a P2M (and an M2P) for PVH priviligied to
      track these foreign pages .. (see arch/arm/xen/p2m.c)."
      (Stefano Stabellini).
      
      We still have to inhibit the building of the P2M tree.
      That had been done in the past by not calling
      xen_build_dynamic_phys_to_machine (which setups the P2M tree
      and gives us virtual address to access them). But we are missing
      a check for xen_build_mfn_list_list - which was continuing to setup
      the P2M tree and would blow up at trying to get the virtual
      address of p2m_missing (which would have been setup by
      xen_build_dynamic_phys_to_machine).
      
      Hence a check is needed to not call xen_build_mfn_list_list when
      running in auto-xlat mode.
      
      Instead of replicating the check for auto-xlat in enlighten.c
      do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
      is called also in xen_arch_post_suspend without any checks for
      auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
      allocate space for an P2M tree.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      696fd7c5
    • M
      xen/pvh: Early bootup changes in PV code (v4). · d285d683
      Mukesh Rathor 提交于
      We don't use the filtering that 'xen_cpuid' is doing
      because the hypervisor treats 'XEN_EMULATE_PREFIX' as
      an invalid instruction. This means that all of the filtering
      will have to be done in the hypervisor/toolstack.
      
      Without the filtering we expose to the guest the:
      
       - cpu topology (sockets, cores, etc);
       - the APERF (which the generic scheduler likes to
          use), see  5e626254
          "xen/setup: filter APERFMPERF cpuid feature out"
       - and the inability to figure out whether MWAIT_LEAF
         should be exposed or not. See
         df88b2d9
         "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded."
       - x2apic, see  4ea9b9ac
         "xen: mask x2APIC feature in PV"
      
      We also check for vector callback early on, as it is a required
      feature. PVH also runs at default kernel IOPL.
      
      Finally, pure PV settings are moved to a separate function that are
      only called for pure PV, ie, pv with pvmmu. They are also #ifdef
      with CONFIG_XEN_PVMMU.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      d285d683
  27. 10 9月, 2013 1 次提交
    • K
      xen/spinlock: Fix locking path engaging too soon under PVHVM. · 1fb3a8b2
      Konrad Rzeszutek Wilk 提交于
      The xen_lock_spinning has a check for the kicker interrupts
      and if it is not initialized it will spin normally (not enter
      the slowpath).
      
      But for PVHVM case we would initialize the kicker interrupt
      before the CPU came online. This meant that if the booting
      CPU used a spinlock and went in the slowpath - it would
      enter the slowpath and block forever. The forever part because
      during bootup: the spinlock would be taken _before_ the CPU
      sets itself to be online (more on this further), and we enter
      to poll on the event channel forever.
      
      The bootup CPU (see commit fc78d343
      "xen/smp: initialize IPI vectors before marking CPU online"
      for details) and the CPU that started the bootup consult
      the cpu_online_mask to determine whether the booting CPU should
      get an IPI. The booting CPU has to set itself in this mask via:
      
        set_cpu_online(smp_processor_id(), true);
      
      However, if the spinlock is taken before this (and it is) and
      it polls on an event channel - it will never be woken up as
      the kernel will never send an IPI to an offline CPU.
      
      Note that the PVHVM logic in sending IPIs is using the HVM
      path which has numerous checks using the cpu_online_mask
      and cpu_active_mask. See above mention git commit for details.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      1fb3a8b2
  28. 21 8月, 2013 1 次提交
    • V
      xen/pvhvm: Initialize xen panic handler for PVHVM guests · 669b0ae9
      Vaughan Cao 提交于
      kernel use callback linked in panic_notifier_list to notice others when panic
      happens.
      NORET_TYPE void panic(const char * fmt, ...){
          ...
          atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
      }
      When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to
      send out an event with reason code - SHUTDOWN_crash.
      
      xen_panic_handler_init() is defined to register on panic_notifier_list but
      we only call it in xen_arch_setup which only be called by PV, this patch is
      necessary for PVHVM.
      
      Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config
      file won't lead a vmcore to be generate when the guest panics. It can be
      reproduced with 'echo c > /proc/sysrq-trigger'.
      Signed-off-by: NVaughan Cao <vaughan.cao@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NJoe Jin <joe.jin@oracle.com>
      669b0ae9
  29. 09 8月, 2013 1 次提交
    • K
      xen: Support 64-bit PV guest receiving NMIs · 6efa20e4
      Konrad Rzeszutek Wilk 提交于
      This is based on a patch that Zhenzhong Duan had sent - which
      was missing some of the remaining pieces. The kernel has the
      logic to handle Xen-type-exceptions using the paravirt interface
      in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
      pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
      pv_cpu_ops.iret).
      
      That means the nmi handler (and other exception handlers) use
      the hypervisor iret.
      
      The other changes that would be neccessary for this would
      be to translate the NMI_VECTOR to one of the entries on the
      ipi_vector and make xen_send_IPI_mask_allbutself use different
      events.
      
      Fortunately for us commit 1db01b49
      (xen: Clean up apic ipi interface) implemented this and we piggyback
      on the cleanup such that the apic IPI interface will pass the right
      vector value for NMI.
      
      With this patch we can trigger NMIs within a PV guest (only tested
      x86_64).
      
      For this to work with normal PV guests (not initial domain)
      we need the domain to be able to use the APIC ops - they are
      already implemented to use the Xen event channels. For that
      to be turned on in a PV domU we need to remove the masking
      of X86_FEATURE_APIC.
      
      Incidentally that means kgdb will also now work within
      a PV guest without using the 'nokgdbroundup' workaround.
      
      Note that the 32-bit version is different and this patch
      does not enable that.
      
      CC: Lisa Nguyen <lisa@xenapiadmin.com>
      CC: Ben Guthro <benjamin.guthro@citrix.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Fixed up per David Vrabel comments]
      Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      6efa20e4
  30. 05 8月, 2013 1 次提交
    • J
      x86: Correctly detect hypervisor · 9df56f19
      Jason Wang 提交于
      We try to handle the hypervisor compatibility mode by detecting hypervisor
      through a specific order. This is not robust, since hypervisors may implement
      each others features.
      
      This patch tries to handle this situation by always choosing the last one in the
      CPUID leaves. This is done by letting .detect() return a priority instead of
      true/false and just re-using the CPUID leaf where the signature were found as
      the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
      has the highest priority. Other sophisticated detection method could also be
      implemented on top.
      
      Suggested by H. Peter Anvin and Paolo Bonzini.
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Doug Covelli <dcovelli@vmware.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9df56f19
  31. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8