提交 · bf6dfb154d935725c9a2005033ca33017b9df439 · openanolis / cloud-kernel

20 8月, 2015 3 次提交

xen/PMU: Intercept PMU-related MSR and APIC accesses · 6b08cd63

由 Boris Ostrovsky 提交于 8月 10, 2015

Provide interfaces for recognizing accesses to PMU-related MSRs and
LVTPC APIC and process these accesses in Xen PMU code.

(The interrupt handler performs XENPMU_flush right away in the beginning
since no PMU emulation is available. It will be added with a later patch).
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

6b08cd63

xen/PMU: Initialization code for Xen PMU · 65d0cf0b

由 Boris Ostrovsky 提交于 8月 10, 2015

Map shared data structure that will hold CPU registers, VPMU context,
V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills
this information in its handler and passes it to the guest for further
processing.

Set up PMU VIRQ.

Now that perf infrastructure will assume that PMU is available on a PV
guest we need to be careful and make sure that accesses via RDPMC
instruction don't cause fatal traps by the hypervisor. Provide a nop
RDPMC handler.

For the same reason avoid issuing a warning on a write to APIC's LVTPC.

Both of these will be made functional in later patches.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

65d0cf0b

xen: add explicit memblock_reserve() calls for special pages · 6c2681c8

由 Juergen Gross 提交于 7月 17, 2015

Some special pages containing interfaces to xen are being reserved
implicitly only today. The memblock_reserve() call to reserve them is
meant to reserve the p2m list supplied by xen. It is just reserving
not only the p2m list itself, but some more pages up to the start of
the xen built page tables.

To be able to move the p2m list to another pfn range, which is needed
for support of huge RAM, this memblock_reserve() must be split up to
cover all affected reserved pages explicitly.

The affected pages are:
- start_info page
- xenstore ring (might be missing, mfn is 0 in this case)
- console ring (not for initial domain)
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

6c2681c8

31 7月, 2015 1 次提交

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall · aa1acff3

由 Andy Lutomirski 提交于 7月 30, 2015

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

aa1acff3

07 6月, 2015 1 次提交

x86/mm/pat: Emulate PAT when it is disabled · 9cd25aac

由 Borislav Petkov 提交于 6月 04, 2015

In the case when PAT is disabled on the command line with
"nopat" or when virtualization doesn't support PAT (correctly) -
see

  9d34cfdf ("x86: Don't rely on VMWare emulating PAT MSR correctly").

we emulate it using the PWT and PCD cache attribute bits. Get
rid of boot_pat_state while at it.

Based on a conglomerate patch from Toshi Kani.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NToshi Kani <toshi.kani@hp.com>
Acked-by: NJuergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: hch@lst.de
Cc: hmh@hmh.eng.br
Cc: konrad.wilk@oracle.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-nvdimm@lists.01.org
Cc: stefan.bader@canonical.com
Cc: yigal@plexistor.com
Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

9cd25aac

19 5月, 2015 2 次提交

x86/fpu: Simplify fpu__cpu_init() · 21c4cd10

由 Ingo Molnar 提交于 4月 26, 2015

After the latest round of cleanups, fpu__cpu_init() has become
a simple call to fpu__init_cpu().

Rename fpu__init_cpu() to fpu__cpu_init() and remove the
extra layer.
Reviewed-by: NBorislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

21c4cd10

x86/fpu: Rename fpu_init() to fpu__cpu_init() · 3a9c4b0d

由 Ingo Molnar 提交于 4月 03, 2015

fpu_init() is a bit of a misnomer in that it (falsely) creates the
impression that it's related to the (old) fpu_finit() function,
which initializes FPU ctx state.

Rename it to fpu__cpu_init() to make its boot time initialization
clear, and to move it to the fpu__*() namespace.

Also fix and extend its comment block to point out that it's
called not only on the boot CPU, but on secondary CPUs as well.
Reviewed-by: NBorislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3a9c4b0d

06 5月, 2015 1 次提交

hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests · a71dbdaa

由 Boris Ostrovsky 提交于 5月 04, 2015

Commit 61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor
attribute issue") makes AMD processors set SS to __KERNEL_DS in
__switch_to() to deal with cases when SS is NULL.

This breaks Xen PV guests who do not want to load SS with__KERNEL_DS.

Since the problem that the commit is trying to address would have to be
fixed in the hypervisor (if it in fact exists under Xen) there is no
reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here.

This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features
op which will clear this flag. (And since this structure is no longer
HVM-specific we should do some renaming).
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a71dbdaa

22 4月, 2015 1 次提交

x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop · aac82d31

由 Andy Lutomirski 提交于 4月 03, 2015

We don't use irq_enable_sysexit on 64-bit kernels any more.
Remove all the paravirt and Xen machinery to support it on
64-bit kernels.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

aac82d31

16 3月, 2015 1 次提交

x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs · feb44f1f

由 Konrad Rzeszutek Wilk 提交于 3月 02, 2015

Instead of mangling the default APIC driver, provide a Xen PV guest
specific one that explicitly provides appropriate methods.

This allows use to report that all APIC IDs are valid, allowing dom0
to boot with more than 255 VCPUs.

Since the probe order of APIC drivers is link dependent, we add in an
late probe function to change to the Xen PV if it hadn't been done
during bootup.
Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
Reported-by: NCathy Avery <cathy.avery@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

feb44f1f

06 3月, 2015 1 次提交

x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu · 8ef46a67

由 Andy Lutomirski 提交于 3月 05, 2015

We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset).  The latter is defined by
hardware and is a clean canonical way to find the top of the
stack.  Add an accessor so we can start using it.

This needs minor paravirt tweaks.  On native, sp0 defines the
top of the kernel stack and is therefore always correct.  On Xen
and lguest, the hypervisor tracks the top of the stack, but we
want to start reading sp0 in the kernel.  Fixing this is simple:
just update our local copy of sp0 as well as the hypervisor's
copy on task switches.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

8ef46a67

24 2月, 2015 2 次提交

x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests · 5054daa2

由 Boris Ostrovsky 提交于 2月 23, 2015

Commit 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
introduced CR4 shadows.

These shadows are initialized in early boot code. The commit missed
initialization for 64-bit PV(H) guests that this patch adds.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

5054daa2

x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set · 31795b47

由 Boris Ostrovsky 提交于 2月 11, 2015

Commit d524165c ("x86/apic: Check x2apic early") tests X2APIC_ENABLE
bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics
the kernel when this bit is set.

Xen's PV guests will pass this MSR read to the hypervisor which will
return its version of the MSR, where this bit might be set. Make sure
we clear it before returning MSR value to the caller.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

31795b47

04 2月, 2015 1 次提交

x86: Clean up cr4 manipulation · 375074cc

由 Andy Lutomirski 提交于 10月 24, 2014

CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

375074cc

13 1月, 2015 1 次提交

x86/xen: properly retrieve NMI reason · f221b04f

由 Jan Beulich 提交于 1月 13, 2015

Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.

Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

f221b04f

16 11月, 2014 1 次提交

xen: Support Xen pv-domains using PAT · 47591df5

由 Juergen Gross 提交于 11月 03, 2014

With the dynamical mapping between cache modes and pgprot values it is
now possible to use all cache modes via the Xen hypervisor PAT settings
in a pv domain.

All to be done is to read the PAT configuration MSR and set up the
translation tables accordingly.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: ville.syrjala@linux.intel.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

47591df5

23 10月, 2014 1 次提交

x86/xen: delay construction of mfn_list_list · 2c185687

由 Juergen Gross 提交于 10月 14, 2014

The 3 level p2m tree for the Xen tools is constructed very early at
boot by calling xen_build_mfn_list_list(). Memory needed for this tree
is allocated via extend_brk().

As this tree (other than the kernel internal p2m tree) is only needed
for domain save/restore, live migration and crash dump analysis it
doesn't matter whether it is constructed very early or just some
milliseconds later when memory allocation is possible by other means.

This patch moves the call of xen_build_mfn_list_list() just after
calling xen_pagetable_p2m_copy() simplifying this function, too, as it
doesn't have to bother with two parallel trees now. The same applies
for some other internal functions.

While simplifying code, make early_can_reuse_p2m_middle() static and
drop the unused second parameter. p2m_mid_identity_mfn can be removed
as well, it isn't used either.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

2c185687

06 10月, 2014 1 次提交

x86/xen: Set EFER.NX and EFER.SCE in PVH guests · a2ef5dc2

由 Mukesh Rathor 提交于 9月 10, 2014

This fixes two bugs in PVH guests:

  - Not setting EFER.NX means the NX bit in page table entries is
    ignored on Intel processors and causes reserved bit page faults on
    AMD processors.

  - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for
    pvh guest") PVH guests are required to set EFER.SCE to enable the
    SYSCALL instruction.

Secondary VCPUs are started with pagetables with the NX bit set so
EFER.NX must be set before using any stack or data segment.
xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
sets EFER before jumping to cpu_bringup_and_idle().
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a2ef5dc2

03 10月, 2014 1 次提交

xen: eliminate scalability issues from initrd handling · d1e9abd6

由 Juergen Gross 提交于 9月 17, 2014

Size restrictions native kernels wouldn't have resulted from the initrd
getting mapped into the initial mapping. The kernel doesn't really need
the initrd to be mapped, so use infrastructure available in Xen to avoid
the mapping and hence the restriction.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

d1e9abd6

23 9月, 2014 1 次提交

x86: remove the Xen-specific _PAGE_IOMAP PTE flag · f955371c

由 David Vrabel 提交于 1月 07, 2014

The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m.  This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.

Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.

Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>

f955371c

27 8月, 2014 1 次提交

x86: Replace __get_cpu_var uses · 89cbc767

由 Christoph Lameter 提交于 8月 17, 2014

__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

89cbc767

19 7月, 2014 2 次提交

arch/x86/xen: Silence compiler warnings · c7341d6a

由 Daniel Kiper 提交于 7月 12, 2014

Compiler complains in the following way when x86 32-bit kernel
with Xen support is build:

  CC      arch/x86/xen/enlighten.o
arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’:
arch/x86/xen/enlighten.c:1726:3: warning: right shift count >= width of type [enabled by default]

Such line contains following EFI initialization code:

boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32);

There is no issue if x86 64-bit kernel is build. However, 32-bit case
generate warning (even if that code will not be executed because Xen
does not work on 32-bit EFI platforms) due to __pa() returning unsigned long
type which has 32-bits width. So move whole EFI initialization stuff
to separate function and build it conditionally to avoid above mentioned
warning on x86 32-bit architecture.
Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

c7341d6a

xen: Put EFI machinery in place · be81c8a1

由 Daniel Kiper 提交于 6月 30, 2014

This patch enables EFI usage under Xen dom0. Standard EFI Linux
Kernel infrastructure cannot be used because it requires direct
access to EFI data and code. However, in dom0 case it is not possible
because above mentioned EFI stuff is fully owned and controlled
by Xen hypervisor. In this case all calls from dom0 to EFI must
be requested via special hypercall which in turn executes relevant
EFI code in behalf of dom0.

When dom0 kernel boots it checks for EFI availability on a machine.
If it is detected then artificial EFI system table is filled.
Native EFI callas are replaced by functions which mimics them
by calling relevant hypercall. Later pointer to EFI system table
is passed to standard EFI machinery and it continues EFI subsystem
initialization taking into account that there is no direct access
to EFI boot services, runtime, tables, structures, etc. After that
system runs as usual.

This patch is based on Jan Beulich and Tang Liang work.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NTang Liang <liang.tang@oracle.com>
Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

be81c8a1

15 7月, 2014 1 次提交

xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests. · 8d693b91

由 Konrad Rzeszutek Wilk 提交于 7月 11, 2014

By default when CONFIG_XEN and CONFIG_XEN_PVHVM kernels are
run, they will enable the PV extensions (drivers, interrupts, timers,
etc) - which is the best option for the majority of use cases.

However, in some cases (kexec not fully working, benchmarking)
we want to disable Xen PV extensions. As such introduce the
'xen_nopv' parameter that will do it.

This parameter is intended only for HVM guests as the Xen PV
guests MUST boot with PV extensions. However, even if you use
'xen_nopv' on Xen PV guests it will be ignored.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
---
[v2: s/off/xen_nopv/ per Boris Ostrovsky recommendation.]
[v3: Add Reviewed-by]
[v4: Clarify that this is only for HVM guests]

8d693b91

05 6月, 2014 1 次提交

x86/xen: fix memory setup for PVH dom0 · abacaadc

由 David Vrabel 提交于 6月 02, 2014

Since af06d66ee32b (x86: fix setup of PVH Dom0 memory map) in Xen, PVH
dom0 need only use the memory memory provided by Xen which has already
setup all the correct holes.

xen_memory_setup() then ends up being trivial for a PVH guest so
introduce a new function (xen_auto_xlated_memory_setup()).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Tested-by: NRoger Pau Monné <roger.pau@citrix.com>

abacaadc

15 5月, 2014 1 次提交

xen/x86: set panic notifier priority to minimum · bc5eb201

由 Radim Krčmář 提交于 5月 13, 2014

Execution is not going to continue after telling Xen about the crash.
Let other panic notifiers run by postponing the final hypercall as much
as possible.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

bc5eb201

06 5月, 2014 1 次提交

asmlinkage, x86: Add explicit __visible to arch/x86/* · 2605fc21

由 Andi Kleen 提交于 5月 02, 2014

As requested by Linus add explicit __visible to the asmlinkage users.
This marks all functions visible to assembler.

Tree sweep for arch/x86/*
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2605fc21

04 2月, 2014 1 次提交

xen/pvh: set CR4 flags for APs · afca5013

由 Mukesh Rathor 提交于 1月 29, 2014

During bootup in the 'probe_page_size_mask' these CR4 flags are
set in there. But for AP processors they are not set as we do not
use 'secondary_startup_64' which the baremetal kernels uses.
Instead do it in this function which we use in Xen PVH during our
startup for AP processors.

As such fix it up to make sure we have that flag set.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

afca5013

22 1月, 2014 1 次提交

xen/pvh: Set X86_CR0_WP and others in CR0 (v2) · c9f6e997

由 Roger Pau Monne 提交于 1月 20, 2014

otherwise we will get for some user-space applications
that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
end up hitting an assert in glibc manifested by:

general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in
libc-2.13.so[7f807209e000+180000]

This is due to the nature of said operations which sets and clears
the PID.  "In the successful one I can see that the page table of
the parent process has been updated successfully to use a
different physical page, so the write of the tid on
that page only affects the child...

On the other hand, in the failed case, the write seems to happen before
the copy of the original page is done, so both the parent and the child
end up with the same value (because the parent copies the page after
the write of the child tid has already happened)."
(Roger's analysis). The nature of this is due to the Xen's commit
of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5
"x86/pvh: set only minimal cr0 and cr4 flags in order to use paging"
the CR0_WP was removed so COW features of the Linux kernel were not
operating properly.

While doing that also update the rest of the CR0 flags to be inline
with what a baremetal Linux kernel would set them to.

In 'secondary_startup_64' (baremetal Linux) sets:

X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP |
X86_CR0_AM | X86_CR0_PG

The hypervisor for HVM type guests (which PVH is a bit) sets:
X86_CR0_PE | X86_CR0_ET | X86_CR0_TS
For PVH it specifically sets:
X86_CR0_PG

Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE  |
X86_CR0_WP | X86_CR0_AM to have full parity.
Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com>
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Took out the cr4 writes to be a seperate patch]
[v2: 0-DAY kernel found xen_setup_gdt to be missing a static]

c9f6e997

07 1月, 2014 1 次提交

xen/pvh: remove duplicated include from enlighten.c · 5602aba8

由 Wei Yongjun 提交于 1月 07, 2014

Remove duplicated include.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

5602aba8

06 1月, 2014 5 次提交

xen/pvh: Piggyback on PVHVM for event channels (v2) · 2771374d

由 Mukesh Rathor 提交于 12月 11, 2013

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.

The most notable PV interfaces are the XenBus and event channels.

We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.

This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

2771374d

xen/pvh: Secondary VCPU bringup (non-bootup CPUs) · 5840c84b

由 Mukesh Rathor 提交于 12月 13, 2013

The VCPU bringup protocol follows the PV with certain twists.
From xen/include/public/arch-x86/xen.h:

Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:

 - For HVM guests, the structures read include: fpu_ctxt (if
 VGCT_I387_VALID is set), flags, user_regs, debugreg[*]

 - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
 set cr3. All other fields not used should be set to 0.

This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.

We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

5840c84b

xen/pvh: Load GDT/GS in early PV bootup code for BSP. · 8d656bbe

由 Mukesh Rathor 提交于 12月 13, 2013

During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).

But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.

For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.

For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

8d656bbe

xen/pvh: Don't setup P2M tree. · 696fd7c5

由 Konrad Rzeszutek Wilk 提交于 12月 15, 2013

P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit 3d24bbd7
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.

with above mentioned "commit you'll get set_phys_to_machine calls
from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
anything with them " (Stefano Stabellini) which is OK - we want
them to be NOPs.

This is because we assume that an "IOMMU is always present on the
plaform and Xen is going to make the appropriate IOMMU pagetable
changes in the hypercall implementation of GNTTABOP_map_grant_ref
and GNTTABOP_unmap_grant_ref, then eveything should be transparent
from PVH priviligied point of view and DMA transfers involving
foreign pages keep working with no issues[sp]

Otherwise we would need a P2M (and an M2P) for PVH priviligied to
track these foreign pages .. (see arch/arm/xen/p2m.c)."
(Stefano Stabellini).

We still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).

Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.

Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

696fd7c5

xen/pvh: Early bootup changes in PV code (v4). · d285d683

由 Mukesh Rathor 提交于 12月 13, 2013

We don't use the filtering that 'xen_cpuid' is doing
because the hypervisor treats 'XEN_EMULATE_PREFIX' as
an invalid instruction. This means that all of the filtering
will have to be done in the hypervisor/toolstack.

Without the filtering we expose to the guest the:

 - cpu topology (sockets, cores, etc);
 - the APERF (which the generic scheduler likes to
    use), see  5e626254
    "xen/setup: filter APERFMPERF cpuid feature out"
 - and the inability to figure out whether MWAIT_LEAF
   should be exposed or not. See
   df88b2d9
   "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded."
 - x2apic, see  4ea9b9ac
   "xen: mask x2APIC feature in PV"

We also check for vector callback early on, as it is a required
feature. PVH also runs at default kernel IOPL.

Finally, pure PV settings are moved to a separate function that are
only called for pure PV, ie, pv with pvmmu. They are also #ifdef
with CONFIG_XEN_PVMMU.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

d285d683

10 9月, 2013 1 次提交

xen/spinlock: Fix locking path engaging too soon under PVHVM. · 1fb3a8b2

由 Konrad Rzeszutek Wilk 提交于 8月 13, 2013

The xen_lock_spinning has a check for the kicker interrupts
and if it is not initialized it will spin normally (not enter
the slowpath).

But for PVHVM case we would initialize the kicker interrupt
before the CPU came online. This meant that if the booting
CPU used a spinlock and went in the slowpath - it would
enter the slowpath and block forever. The forever part because
during bootup: the spinlock would be taken _before_ the CPU
sets itself to be online (more on this further), and we enter
to poll on the event channel forever.

The bootup CPU (see commit fc78d343
"xen/smp: initialize IPI vectors before marking CPU online"
for details) and the CPU that started the bootup consult
the cpu_online_mask to determine whether the booting CPU should
get an IPI. The booting CPU has to set itself in this mask via:

  set_cpu_online(smp_processor_id(), true);

However, if the spinlock is taken before this (and it is) and
it polls on an event channel - it will never be woken up as
the kernel will never send an IPI to an offline CPU.

Note that the PVHVM logic in sending IPIs is using the HVM
path which has numerous checks using the cpu_online_mask
and cpu_active_mask. See above mention git commit for details.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

1fb3a8b2

21 8月, 2013 1 次提交

xen/pvhvm: Initialize xen panic handler for PVHVM guests · 669b0ae9

由 Vaughan Cao 提交于 8月 16, 2013

kernel use callback linked in panic_notifier_list to notice others when panic
happens.
NORET_TYPE void panic(const char * fmt, ...){
    ...
    atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
}
When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to
send out an event with reason code - SHUTDOWN_crash.

xen_panic_handler_init() is defined to register on panic_notifier_list but
we only call it in xen_arch_setup which only be called by PV, this patch is
necessary for PVHVM.

Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config
file won't lead a vmcore to be generate when the guest panics. It can be
reproduced with 'echo c > /proc/sysrq-trigger'.
Signed-off-by: NVaughan Cao <vaughan.cao@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NJoe Jin <joe.jin@oracle.com>

669b0ae9

09 8月, 2013 1 次提交

xen: Support 64-bit PV guest receiving NMIs · 6efa20e4

由 Konrad Rzeszutek Wilk 提交于 7月 19, 2013

This is based on a patch that Zhenzhong Duan had sent - which
was missing some of the remaining pieces. The kernel has the
logic to handle Xen-type-exceptions using the paravirt interface
in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
pv_cpu_ops.iret).

That means the nmi handler (and other exception handlers) use
the hypervisor iret.

The other changes that would be neccessary for this would
be to translate the NMI_VECTOR to one of the entries on the
ipi_vector and make xen_send_IPI_mask_allbutself use different
events.

Fortunately for us commit 1db01b49
(xen: Clean up apic ipi interface) implemented this and we piggyback
on the cleanup such that the apic IPI interface will pass the right
vector value for NMI.

With this patch we can trigger NMIs within a PV guest (only tested
x86_64).

For this to work with normal PV guests (not initial domain)
we need the domain to be able to use the APIC ops - they are
already implemented to use the Xen event channels. For that
to be turned on in a PV domU we need to remove the masking
of X86_FEATURE_APIC.

Incidentally that means kgdb will also now work within
a PV guest without using the 'nokgdbroundup' workaround.

Note that the 32-bit version is different and this patch
does not enable that.

CC: Lisa Nguyen <lisa@xenapiadmin.com>
CC: Ben Guthro <benjamin.guthro@citrix.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Fixed up per David Vrabel comments]
Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

6efa20e4

05 8月, 2013 1 次提交

x86: Correctly detect hypervisor · 9df56f19

由 Jason Wang 提交于 7月 25, 2013

We try to handle the hypervisor compatibility mode by detecting hypervisor
through a specific order. This is not robust, since hypervisors may implement
each others features.

This patch tries to handle this situation by always choosing the last one in the
CPUID leaves. This is done by letting .detect() return a priority instead of
true/false and just re-using the CPUID leaf where the signature were found as
the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
has the highest priority. Other sophisticated detection method could also be
implemented on top.

Suggested by H. Peter Anvin and Paolo Bonzini.
Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Doug Covelli <dcovelli@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9df56f19

15 7月, 2013 1 次提交

x86: delete __cpuinit usage from all x86 files · 148f9bb8

由 Paul Gortmaker 提交于 6月 18, 2013

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files.  x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

148f9bb8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功