提交 · fed7c3f0f750f225317828d691e9eb76eec887b3 · openanolis / cloud-kernel

08 5月, 2015 1 次提交

x86/entry: Stop using PER_CPU_VAR(kernel_stack) · 63332a84

由 Denys Vlasenko 提交于 4月 24, 2015

PER_CPU_VAR(kernel_stack) is redundant:

  - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
  - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).

PER_CPU_VAR(kernel_stack) will be deleted by a separate change.
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

63332a84

06 5月, 2015 1 次提交

hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests · a71dbdaa

由 Boris Ostrovsky 提交于 5月 04, 2015

Commit 61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor
attribute issue") makes AMD processors set SS to __KERNEL_DS in
__switch_to() to deal with cases when SS is NULL.

This breaks Xen PV guests who do not want to load SS with__KERNEL_DS.

Since the problem that the commit is trying to address would have to be
fixed in the hypervisor (if it in fact exists under Xen) there is no
reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here.

This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features
op which will clear this flag. (And since this structure is no longer
HVM-specific we should do some renaming).
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a71dbdaa

30 4月, 2015 1 次提交

xen: Suspend ticks on all CPUs during suspend · 2b953a5e

由 Boris Ostrovsky 提交于 4月 28, 2015

Commit 77e32c89 ("clockevents: Manage device's state separately for
the core") decouples clockevent device's modes from states. With this
change when a Xen guest tries to resume, it won't be calling its
set_mode op which needs to be done on each VCPU in order to make the
hypervisor aware that we are in oneshot mode.

This happens because clockevents_tick_resume() (which is an intermediate
step of resuming ticks on a processor) doesn't call clockevents_set_state()
anymore and because during suspend clockevent devices on all VCPUs (except
for the one doing the suspend) are left in ONESHOT state. As result, during
resume the clockevents state machine will assume that device is already
where it should be and doesn't need to be updated.

To avoid this problem we should suspend ticks on all VCPUs during
suspend.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

2b953a5e

22 4月, 2015 1 次提交

x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop · aac82d31

由 Andy Lutomirski 提交于 4月 03, 2015

We don't use irq_enable_sysexit on 64-bit kernels any more.
Remove all the paravirt and Xen machinery to support it on
64-bit kernels.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

aac82d31

15 4月, 2015 1 次提交

x86: expose number of page table levels on Kconfig level · 98233368

由 Kirill A. Shutemov 提交于 4月 14, 2015

We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98233368

02 4月, 2015 1 次提交

x86/cpu: Factor out common CPU initialization code, fix 32-bit Xen PV guests · 3f85483b

由 Boris Ostrovsky 提交于 4月 01, 2015

Some of x86 bare-metal and Xen CPU initialization code is common
between the two and therefore can be factored out to avoid code
duplication.

As a side effect, doing so will also extend the fix provided by
commit a7fcf28d ("x86/asm/entry: Replace this_cpu_sp0() with
current_top_of_stack() to x86_32") to 32-bit Xen PV guests.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1427897534-5086-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3f85483b

01 4月, 2015 2 次提交

tick/xen: Provide and use tick_suspend_local() and tick_resume_local() · f46481d0

由 Thomas Gleixner 提交于 3月 25, 2015

Xen calls on every cpu into tick_resume() which is just wrong.
tick_resume() is for the syscore global suspend/resume
invocation. What XEN really wants is a per cpu local resume
function.

Provide a tick_resume_local() function and use it in XEN.

Also provide a complementary tick_suspend_local() and modify
tick_unfreeze() and tick_freeze(), respectively, to use the
new local tick resume/suspend functions.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
[ Combined two patches, rebased, modified subject/changelog. ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1698741.eezk9tnXtG@vostro.rjw.lan
[ Merged to latest timers/core. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f46481d0

clockevents: Make suspend/resume calls explicit · 4ffee521

由 Thomas Gleixner 提交于 3月 25, 2015

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.

We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.

No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4ffee521

25 3月, 2015 1 次提交

x86/asm/entry: Get rid of KERNEL_STACK_OFFSET · ef593260

由 Denys Vlasenko 提交于 3月 19, 2015

PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ef593260

23 3月, 2015 1 次提交

x86/xen: prepare p2m list for memory hotplug · 633d6f17

由 Juergen Gross 提交于 3月 20, 2015

Commit 054954eb ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.

Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> # 3.19+
Reviewed-by: NDaniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

633d6f17

17 3月, 2015 1 次提交

x86/asm/entry/64: Rename 'old_rsp' to 'rsp_scratch' · c38e5038

由 Ingo Molnar 提交于 3月 17, 2015

Make clear that the usage of PER_CPU(old_rsp) is purely temporary,
by renaming it to 'rsp_scratch'.

Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c38e5038

16 3月, 2015 7 次提交

xen/privcmd: improve performance of MMAPBATCH_V2 · 4e8c0c8c

由 David Vrabel 提交于 3月 11, 2015

Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map
multiple frames at a time rather than one at a time, despite the pages
being non-consecutive GFNs.

xen_remap_foreign_mfn_array() is added which maps an array of GFNs
(instead of a consecutive range of GFNs).

Since per-frame errors are returned in an array, privcmd must set the
MMAPBATCH_V1 error bits as part of the "report errors" phase, after
all the frames are mapped.

Migrate times are significantly improved (when using a PV toolstack
domain).  For example, for an idle 12 GiB PV guest:

        Before     After
  real  0m38.179s  0m26.868s
  user  0m15.096s  0m13.652s
  sys   0m28.988s  0m18.732s
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

4e8c0c8c

xen: unify foreign GFN map/unmap for auto-xlated physmap guests · 628c28ee

由 David Vrabel 提交于 3月 11, 2015

Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and
unmap foreign GFNs using the same method (updating the physmap).
Unify the two arm and x86 implementations into one commont one.

Note that on arm and arm64, the correct error code will be returned
(instead of always -EFAULT) and map/unmap failure warnings are no
longer printed.  These changes are required if the foreign domain is
paging (-ENOENT failures are expected and must be propagated up to the
caller).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

628c28ee

x86/xen/apic: WARN with details. · b3b06c7e

由 Konrad Rzeszutek Wilk 提交于 3月 03, 2015

We should not be writting to the APIC registers under
Xen PV. But if we do instead of just giving an blanket
warning - include some details to help troubleshoot.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b3b06c7e

x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs · feb44f1f

由 Konrad Rzeszutek Wilk 提交于 3月 02, 2015

Instead of mangling the default APIC driver, provide a Xen PV guest
specific one that explicitly provides appropriate methods.

This allows use to report that all APIC IDs are valid, allowing dom0
to boot with more than 255 VCPUs.

Since the probe order of APIC drivers is link dependent, we add in an
late probe function to change to the Xen PV if it hadn't been done
during bootup.
Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
Reported-by: NCathy Avery <cathy.avery@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

feb44f1f

xen: use generated hypercall symbols in arch/x86/xen/xen-head.S · 526abeae

由 Juergen Gross 提交于 1月 21, 2015

Instead of manually list each hypercall in arch/x86/xen/xen-head.S
use the auto generated symbol list.

This also corrects the wrong address of xen_hypercall_mca which was
located 32 bytes higher than it should.

Symbol addresses have been verified to match the correct ones via
objdump output.
Based-on-patch-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

526abeae

xen: use generated hypervisor symbols in arch/x86/xen/trace.c · fc903f87

由 Juergen Gross 提交于 1月 21, 2015

Instead of manually list all hypervisor calls in arch/x86/xen/trace.c
use the auto generated list.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

fc903f87

xen: synchronize include/xen/interface/xen.h with xen · 16b12d60

由 Juergen Gross 提交于 1月 21, 2015

The header include/xen/interface/xen.h doesn't contain all definitions
from Xen's version of that header. Update it accordingly.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

16b12d60

12 3月, 2015 1 次提交

x86: Use common outgoing-CPU-notification code · 2a442c9c

由 Paul E. McKenney 提交于 2月 25, 2015

This commit removes the open-coded CPU-offline notification with new
common code.  Among other things, this change avoids calling scheduler
code using RCU from an offline CPU that RCU is ignoring.  It also allows
Xen to notice at online time that the CPU did not go offline correctly.
Note that Xen has the surviving CPU carry out some cleanup operations,
so if the surviving CPU times out, these cleanup operations might have
been carried out while the outgoing CPU was still running.  It might
therefore be unwise to bring this CPU back online, and this commit
avoids doing so.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <x86@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: <xen-devel@lists.xenproject.org>

2a442c9c

06 3月, 2015 1 次提交

x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu · 8ef46a67

由 Andy Lutomirski 提交于 3月 05, 2015

We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset).  The latter is defined by
hardware and is a clean canonical way to find the top of the
stack.  Add an accessor so we can start using it.

This needs minor paravirt tweaks.  On native, sp0 defines the
top of the kernel stack and is therefore always correct.  On Xen
and lguest, the hypervisor tracks the top of the stack, but we
want to start reading sp0 in the kernel.  Fixing this is simple:
just update our local copy of sp0 as well as the hypervisor's
copy on task switches.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

8ef46a67

27 2月, 2015 1 次提交

x86/xen: correct bug in p2m list initialization · b8f05c88

由 Juergen Gross 提交于 2月 27, 2015

Commit 054954eb ("xen: switch to
linear virtual mapped sparse p2m list") introduced an error.

During initialization of the p2m list a p2m identity area mapped by
a complete identity pmd entry has to be split up into smaller chunks
sometimes, if a non-identity pfn is introduced in this area.

If this non-identity pfn is not at index 0 of a p2m page the new
p2m page needed is initialized with wrong identity entries, as the
identity pfns don't start with the value corresponding to index 0,
but with the initial non-identity pfn. This results in weird wrong
mappings.

Correct the wrong initialization by starting with the correct pfn.

Cc: stable@vger.kernel.org # 3.19
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

b8f05c88

24 2月, 2015 2 次提交

x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests · 5054daa2

由 Boris Ostrovsky 提交于 2月 23, 2015

Commit 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
introduced CR4 shadows.

These shadows are initialized in early boot code. The commit missed
initialization for 64-bit PV(H) guests that this patch adds.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

5054daa2

x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set · 31795b47

由 Boris Ostrovsky 提交于 2月 11, 2015

Commit d524165c ("x86/apic: Check x2apic early") tests X2APIC_ENABLE
bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics
the kernel when this bit is set.

Xen's PV guests will pass this MSR read to the hypervisor which will
return its version of the MSR, where this bit might be set. Make sure
we clear it before returning MSR value to the caller.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

31795b47

18 2月, 2015 1 次提交

x86/spinlocks/paravirt: Fix memory corruption on unlock · d6abfdb2

由 Raghavendra K T 提交于 2月 06, 2015

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:

                prev = *lock;
                add_smp(&lock->tickets.head, TICKET_LOCK_INC);

                /* add_smp() is a full mb() */

                if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
                        __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:

 1. Moving slowpath flag to head (Oleg):
    Unlocked locks don't care about the slowpath flag; therefore we can keep
    it set after the last unlock, and clear it again on the first (try)lock.
    -- this removes the write after unlock. note that keeping slowpath flag would
    result in unnecessary kicks.
    By moving the slowpath flag from the tail to the head ticket we also avoid
    the need to access both the head and tail tickets on unlock.

 2. use xadd to avoid read/write after unlock that checks the need for
    unlock_kick (Linus):
    We further avoid the need for a read-after-release by using xadd;
    the prev head value will include the slowpath flag and indicate if we
    need to do PV kicking of suspended spinners -- on modern chips xadd
    isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x           -0.13
 kernbench  2x            0.02
 dbench     1x           -1.77
 dbench     2x           -0.63

[Jeremy: Hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moved slowpath flag to head, ticket_equals idea]
[PeterZ: Added detailed changelog]
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: a.ryabinin@samsung.com
Cc: dave@stgolabs.net
Cc: hpa@zytor.com
Cc: jasowang@redhat.com
Cc: jeremy@goop.org
Cc: paul.gortmaker@windriver.com
Cc: riel@redhat.com
Cc: tglx@linutronix.de
Cc: waiman.long@hp.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d6abfdb2

04 2月, 2015 1 次提交

x86: Clean up cr4 manipulation · 375074cc

由 Andy Lutomirski 提交于 10月 24, 2014

CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

375074cc

28 1月, 2015 9 次提交

xen: mark grant mapped pages as foreign · 8da7633f

由 Jennifer Herbert 提交于 12月 24, 2014

Use the "foreign" page flag to mark pages that have a grant map.  Use
page->private to store information of the grant (the granting domain
and the grant reference).
Signed-off-by: NJennifer Herbert <jennifer.herbert@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

8da7633f

x86/xen: require ballooned pages for grant maps · 0ae65f49

由 Jennifer Herbert 提交于 12月 24, 2014

Ballooned pages are always used for grant maps which means the
original frame does not need to be saved in page->index nor restored
after the grant unmap.

This allows the workaround in netback for the conflicting use of the
(unionized) page->index and page->pfmemalloc to be removed.
Signed-off-by: NJennifer Herbert <jennifer.herbert@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

0ae65f49

xen: remove scratch frames for ballooned pages and m2p override · 0bb599fd

由 David Vrabel 提交于 1月 05, 2015

The scratch frame mappings for ballooned pages and the m2p override
are broken.  Remove them in preparation for replacing them with
simpler mechanisms that works.

The scratch pages did not ensure that the page was not in use.  In
particular, the foreign page could still be in use by hardware.  If
the guest reused the frame the hardware could read or write that
frame.

The m2p override did not handle the same frame being granted by two
different grant references.  Trying an M2P override lookup in this
case is impossible.

With the m2p override removed, the grant map/unmap for the kernel
mappings (for x86 PV) can be easily batched in
set_foreign_p2m_mapping() and clear_foreign_p2m_mapping().
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

0bb599fd

xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() · 853d0289

由 David Vrabel 提交于 1月 05, 2015

When unmapping grants, instead of converting the kernel map ops to
unmap ops on the fly, pre-populate the set of unmap ops.

This allows the grant unmap for the kernel mappings to be trivially
batched in the future.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

853d0289

x86/xen: cleanup arch/x86/xen/mmu.c · 270b7933

由 Juergen Gross 提交于 1月 28, 2015

Remove a nested ifdef.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

270b7933

x86/xen: add some __init annotations in arch/x86/xen/mmu.c · bf9d834a

由 Juergen Gross 提交于 1月 28, 2015

The file arch/x86/xen/mmu.c has some functions that can be annotated
with "__init".
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

bf9d834a

x86/xen: add some __init and static annotations in arch/x86/xen/setup.c · a3f52396

由 Juergen Gross 提交于 1月 28, 2015

Some more functions in arch/x86/xen/setup.c can be made "__init".
xen_ignore_unusable() can be made "static".
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a3f52396

x86/xen: use correct types for addresses in arch/x86/xen/setup.c · 3ba5c867

由 Juergen Gross 提交于 1月 28, 2015

In many places in arch/x86/xen/setup.c wrong types are used for
physical addresses (u64 or unsigned long long). Use phys_addr_t
instead.

Use macros already defined instead of open coding them.

Correct some other type mismatches.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

3ba5c867

x86/xen: cleanup arch/x86/xen/setup.c · f0feed10

由 Juergen Gross 提交于 1月 28, 2015

Remove extern declarations in arch/x86/xen/setup.c which are either
not used or redundant. Move needed other extern declarations to
xen-ops.h
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

f0feed10

26 1月, 2015 1 次提交

x86,xen: use current->state helpers · 57b6b99b

由 Davidlohr Bueso 提交于 1月 26, 2015

Call __set_current_state() instead of assigning the new state directly.
These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments,
keeping track of who changed the state.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

57b6b99b

21 1月, 2015 1 次提交

x86/xen: prefer TSC over xen clocksource for dom0 · 94dd85f6

由 Palik, Imre 提交于 1月 13, 2015

In Dom0's the use of the TSC clocksource (whenever it is stable enough to
be used) instead of the Xen clocksource should not cause any issues, as
Dom0 VMs never live-migrated.  The TSC clocksource is somewhat more
efficient than the Xen paravirtualised clocksource, thus it should have
higher rating.

This patch decreases the rating of the Xen clocksource in Dom0s to 275.
Which is half-way between the rating of the TSC clocksource (300) and the
hpet clocksource (250).

Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: NImre Palik <imrep@amazon.de>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

94dd85f6

19 1月, 2015 1 次提交

x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE · 1760f1eb

由 Christian Borntraeger 提交于 12月 07, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the p2m code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>

1760f1eb

13 1月, 2015 1 次提交

x86/xen: properly retrieve NMI reason · f221b04f

由 Jan Beulich 提交于 1月 13, 2015

Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.

Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

f221b04f

12 1月, 2015 2 次提交

xen: check for zero sized area when invalidating memory · 9a17ad7f

由 Juergen Gross 提交于 1月 12, 2015

With the introduction of the linear mapped p2m list setting memory
areas to "invalid" had to be delayed. When doing the invalidation
make sure no zero sized areas are processed.
Signed-off-by: NJuegren Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

9a17ad7f

xen: use correct type for physical addresses · e86f9496

由 Juergen Gross 提交于 1月 12, 2015

When converting a pfn to a physical address be sure to use 64 bit
wide types or convert the physical address to a pfn if possible.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

e86f9496

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功