提交 · 162e6df47c4f80c87cb617ec473eca015df454ca · OpenHarmony / kernel_linux

23 9月, 2015 1 次提交

x86/paravirt: Replace the paravirt nop with a bona fide empty function · fc57a7c6

由 Andy Lutomirski 提交于 9月 20, 2015

PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):

    ff 15 00 00 00 00       callq  *0x0(%rip)        # 2796 <nmi+0x6>
              2792: R_X86_64_PC32     pv_irq_ops+0x2c

That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse.  This is bad news for a CLBR_NONE operation.

Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in?  This can potentially cause breakage
that is very difficult to debug.

A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.

The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.

Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.

The Xen case may have other problems, so document them.

This is part of a fix for some random crashes that Sasha saw.
Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

fc57a7c6

06 7月, 2015 1 次提交

x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks · 9261e050

由 Andy Lutomirski 提交于 6月 25, 2015

We've had ->read_tsc() and ->read_tscp() paravirt hooks since
the very beginning of paravirt, i.e.,

  d3561b7f ("[PATCH] paravirt: header and stubs for paravirtualisation").

AFAICT, the only paravirt guest implementation that ever
replaced these calls was vmware, and it's gone. Arguably even
vmware shouldn't have hooked RDTSC -- we fully support systems
that don't have a TSC at all, so there's no point for a paravirt
implementation to pretend that we have a TSC but to replace it.

I also doubt that these hooks actually worked. Calls to rdtscl()
and rdtscll(), which respected the hooks, were used seemingly
interchangeably with native_read_tsc(), which did not.

Just remove them. If anyone ever needs them again, they can try
to make a case for why they need them.

Before, on a paravirt config:
  text    	data     bss     dec     hex filename
  12618257      1816384 1093632 15528273 ecf151 vmlinux

After:
  text		data     bss     dec     hex filename
  12617207      1816384 1093632 15527223 eced37 vmlinux
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

9261e050

22 4月, 2015 1 次提交

x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop · aac82d31

由 Andy Lutomirski 提交于 4月 03, 2015

We don't use irq_enable_sysexit on 64-bit kernels any more.
Remove all the paravirt and Xen machinery to support it on
64-bit kernels.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

aac82d31

15 4月, 2015 1 次提交

x86: expose number of page table levels on Kconfig level · 98233368

由 Kirill A. Shutemov 提交于 4月 14, 2015

We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98233368

24 4月, 2014 2 次提交

kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt() · 80271972

由 Masami Hiramatsu 提交于 4月 17, 2014

Since the kprobes uses do_debug for single stepping,
functions called from do_debug() before notify_die() must not
be probed.

And also native_load_idt() is called from paranoid_exit when
returning int3, this also must not be probed.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20140417081719.26341.65542.stgit@ltc230.yrl.intra.hitachi.co.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>

80271972

kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklist · 376e2424

由 Masami Hiramatsu 提交于 4月 17, 2014

Introduce NOKPROBE_SYMBOL() macro which builds a kprobes
blacklist at kernel build time.

The usage of this macro is similar to EXPORT_SYMBOL(),
placed after the function definition:

  NOKPROBE_SYMBOL(function);

Since this macro will inhibit inlining of static/inline
functions, this patch also introduces a nokprobe_inline macro
for static/inline functions. In this case, we must use
NOKPROBE_SYMBOL() for the inline function caller.

When CONFIG_KPROBES=y, the macro stores the given function
address in the "_kprobe_blacklist" section.

Since the data structures are not fully initialized by the
macro (because there is no "size" information),  those
are re-initialized at boot time by using kallsyms.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20140417081705.26341.96719.stgit@ltc230.yrl.intra.hitachi.co.jp
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jan-Simon Möller <dl9pf@gmx.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-sparse@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

376e2424

05 9月, 2013 1 次提交

x86, paravirt: Remove duplicate definition for DEF_NATIVE · f2a7b303

由 H. Peter Anvin 提交于 9月 04, 2013

DEF_NATIVE() is defined in paravirt_types.h, remove duplicate
definition in paravirt.c
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Andi Kleen <ak@linux.kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/CA%2B55aFxVv==DC0JdS87V%2BcPr-twN%2BTujYg5XmgHOjJOAkZ4xwQ@mail.gmail.com

f2a7b303

07 8月, 2013 1 次提交

x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops · 9a55fdbe

由 Andi Kleen 提交于 8月 05, 2013

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9a55fdbe

12 4月, 2013 1 次提交

x86, xen, gdt: Remove the pvops variant of store_gdt. · 357d1226

由 Konrad Rzeszutek Wilk 提交于 4月 05, 2013

The two use-cases where we needed to store the GDT were during ACPI S3 suspend
and resume. As the patches:
x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.

have demonstrated - there are other mechanism by which the GDT is
saved and reloaded during early resume path.

Hence we do not need to worry about the pvops call-chain for saving the
GDT and can and can eliminate it. The other areas where the store_gdt is
used are never going to be hit when running under the pvops platforms.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

357d1226

11 4月, 2013 1 次提交

x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal · 511ba86e

由 Boris Ostrovsky 提交于 3月 23, 2013

Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
  updates" may cause a minor performance regression on
  bare metal.  This patch resolves that performance regression.  It is
  somewhat unclear to me if this is a good -stable candidate. ]
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.comTested-by: NJosh Boyer <jwboyer@redhat.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> SEE NOTE ABOVE

511ba86e

08 6月, 2012 1 次提交

x86, pvops: Remove hooks for {rd,wr}msr_safe_regs · 1f975f78

由 Andre Przywara 提交于 6月 01, 2012

There were paravirt_ops hooks for the full register set variant of
{rd,wr}msr_safe which are actually not used by anyone anymore. Remove
them to make the code cleaner and avoid silent breakages when the pvops
members were uninitialized. This has been boot-tested natively and under
Xen with PVOPS enabled and disabled on one machine.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-2-git-send-email-bp@amd64.orgAcked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1f975f78

15 5月, 2012 1 次提交

x86: replace percpu_xxx funcs with this_cpu_xxx · c6ae41e7

由 Alex Shi 提交于 5月 11, 2012

Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.

On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.

Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.

Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Acked-by: NChristoph Lameter <cl@gentwo.org>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

c6ae41e7

29 3月, 2012 1 次提交

Disintegrate asm/system.h for X86 · f05e798a

由 David Howells 提交于 3月 28, 2012

Disintegrate asm/system.h for X86.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
cc: x86@kernel.org

f05e798a

01 3月, 2012 1 次提交

bug.h: add include of it to various implicit C users · 50af5ead

由 Paul Gortmaker 提交于 1月 20, 2012

With bug.h currently living right in linux/kernel.h there
are files that use BUG_ON and friends but are not including
the header explicitly.  Fix them up so we can remove the
presence in kernel.h file.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

50af5ead

24 2月, 2012 1 次提交

static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb

由 Ingo Molnar 提交于 2月 24, 2012

static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()

So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NJason Baron <jbaron@redhat.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>

c5905afb

05 8月, 2011 1 次提交

x86-64: Add user_64bit_mode paravirt op · 318f5a2a

由 Andy Lutomirski 提交于 8月 03, 2011

Three places in the kernel assume that the only long mode CPL 3
selector is __USER_CS.  This is not true on Xen -- Xen's sysretq
changes cs to the magic value 0xe033.

Two of the places are corner cases, but as of "x86-64: Improve
vsyscall emulation CS and RIP handling"
(c9712944), vsyscalls will segfault
if called with Xen's extra CS selector.  This causes a panic when
older init builds die.

It seems impossible to make Xen use __USER_CS reliably without
taking a performance hit on every system call, so this fixes the
tests instead with a new paravirt op.  It's a little ugly because
ptrace.h can't include paravirt.h.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.eduReported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

318f5a2a

14 7月, 2011 1 次提交

KVM guest: Add a pv_ops stub for steal time · 3c404b57

由 Glauber Costa 提交于 7月 11, 2011

This patch adds a function pointer in one of the many paravirt_ops
structs, to allow guests to register a steal time function. Besides
a steal time function, we also declare two jump_labels. They will be
used to allow the steal time code to be easily bypassed when not
in use.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3c404b57

14 1月, 2011 1 次提交

thp: add pmd paravirt ops · 331127f7

由 Andrea Arcangeli 提交于 1月 13, 2011

Paravirt ops pmd_update/pmd_update_defer/pmd_set_at.  Not all might be
necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs
pmd_update_defer), but this is to keep full simmetry with pte paravirt
ops, which looks cleaner and simpler from a common code POV.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

331127f7

24 8月, 2010 1 次提交

x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI · b0f4c062

由 Alok Kataria 提交于 8月 23, 2010

VMI was the only user of the alloc_pmd_clone hook, given that VMI
is now removed we can also remove this hook.
Signed-off-by: NAlok N Kataria <akataria@vmware.com>
LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b0f4c062

28 2月, 2010 1 次提交

x86, paravirt: Remove kmap_atomic_pte paravirt op. · dad52fc0

由 Ian Campbell 提交于 2月 26, 2010

Now that both Xen and VMI disable allocations of PTE pages from high
memory this paravirt op serves no further purpose.

This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping
highpte pages".
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com>
Acked-by: NAlok Kataria <akataria@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

dad52fc0

16 9月, 2009 1 次提交

x86: Move get/set_wallclock to x86_platform_ops · 7bd867df

由 Feng Tang 提交于 9月 10, 2009

get/set_wallclock() have already a set of platform dependent
implementations (default, EFI, paravirt). MRST will add another
variant.

Moving them to platform ops simplifies the existing code and minimizes
the effort to integrate new variants.
Signed-off-by: NFeng Tang <feng.tang@intel.com>
LKML-Reference: <new-submission>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7bd867df

01 9月, 2009 2 次提交

x86, msr: Rewrite AMD rd/wrmsr variants · 177fed1e

由 Borislav Petkov 提交于 8月 31, 2009

Switch them to native_{rd,wr}msr_safe_regs and remove
pv_cpu_ops.read_msr_amd.
Signed-off-by: NBorislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

177fed1e

x86, msr: Add rd/wrmsr interfaces with preset registers · 132ec92f

由 Borislav Petkov 提交于 8月 31, 2009

native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow
presetting of a subset of eight x86 GPRs before executing the rd/wrmsr
instructions. This is needed at least on AMD K8 for accessing an erratum
workaround MSR.

Originally based on an idea by H. Peter Anvin.
Signed-off-by: NBorislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

132ec92f

31 8月, 2009 7 次提交

x86: Move tsc_calibration to x86_init_ops · 2d826404

由 Thomas Gleixner 提交于 8月 20, 2009

TSC calibration is modified by the vmware hypervisor and paravirt by
separate means. Moorestown wants to add its own calibration routine as
well. So make calibrate_tsc a proper x86_init_ops function and
override it by paravirt or by the early setup of the vmware
hypervisor.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

2d826404

x86: Add timer_init to x86_init_ops · 845b3944

由 Thomas Gleixner 提交于 8月 19, 2009

The timer init code is convoluted with several quirks and the paravirt
timer chooser. Figuring out which code path is actually taken is not
for the faint hearted.

Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and
replace the paravirt time chooser and the remaining x86 quirk with a
simple x86_init_ops function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

845b3944

x86: Move percpu clockevents setup to x86_init_ops · 736decac

由 Thomas Gleixner 提交于 8月 19, 2009

paravirt overrides the setup of the default apic timers as per cpu
timers. Moorestown needs to override that as well.

Move it to x86_init_ops setup and create a separate x86_cpuinit struct
which holds the function for the secondary evtl. hotplugabble CPUs.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

736decac

T
x86: Move paravirt pagetable_setup to x86_init_ops · 030cb6c0
由 Thomas Gleixner 提交于 8月 20, 2009
```
Replace more paravirt hackery by proper x86_init_ops.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
```
030cb6c0

x86: Move paravirt banner printout to x86_init_ops · 6f30c1ac

由 Thomas Gleixner 提交于 8月 20, 2009

Replace another obscure paravirt magic and move it to
x86_init_ops. Such a hook is also useful for embedded and special
hardware.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6f30c1ac

x86: Replace ARCH_SETUP by a proper x86_init_ops · 42bbdb43

由 Thomas Gleixner 提交于 8月 20, 2009

ARCH_SETUP is a horrible leftover from the old arch/i386 mach support
code. It still has a lonely user in xen. Move it to x86_init_ops.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

42bbdb43

x86: Move irq_init to x86_init_ops · 66bcaf0b

由 Thomas Gleixner 提交于 8月 20, 2009

irq_init is overridden by x86_quirks and by paravirts. Unify the whole
mess and make it an unconditional x86_init_ops function which defaults
to the standard function and can be overridden by the early platform
code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

66bcaf0b

27 8月, 2009 1 次提交

x86: Move memory_setup to x86_init_ops · 6b18ae3e

由 Thomas Gleixner 提交于 8月 20, 2009

memory_setup is overridden by x86_quirks and by paravirts with weak
functions and quirks. Unify the whole mess and make it an
unconditional x86_init_ops function which defaults to the standard
function and can be overridden by the early platform code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6b18ae3e

16 5月, 2009 1 次提交

x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126

由 Jeremy Fitzhardinge 提交于 5月 13, 2009

Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.

It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:

 | commit 8efcbab6
 | Date:   Mon Jul 7 12:07:51 2008 -0700
 |
 |     paravirt: introduce a "lock-byte" spinlock implementation

The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test).  The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons).  This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions.  But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).

If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).

Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:

		nopv		Pv-nospin	Pv-spin
CPU cycles	100.00%		99.89%		102.18%
instructions	100.00%		100.10%		100.15%
CPI		100.00%		99.79%		102.03%
cache ref	100.00%		100.84%		100.28%
cache miss	100.00%		90.47%		88.56%
cache miss rate	100.00%		89.72%		88.31%
branches	100.00%		99.93%		100.04%
branch miss	100.00%		103.66%		107.72%
branch miss rt	100.00%		103.73%		107.67%
wallclock	100.00%		99.90%		102.20%

The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.

(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates.  Not too surprising, but it suggests that
the non-pvops kernel is over-inlined.  On the flipside,
the branch misses go up correspondingly...)

So, what's the fix?

Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls.  For example, the compiler
generated code for paravirtualized _spin_lock is:

<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq  *0xffffffff805a5b30
<_spin_lock+22>:	retq

The indirect call will get patched to:
<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq <__ticket_spin_lock>
<_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
<_spin_lock+22>:	retq

One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled).  That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case.  The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.

The other obvious answer is to disable pv-spinlocks.  Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin).  Still it is a
reasonable short-term workaround.

[ Impact: fix pvops performance regression when running native ]
Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: N"Li Xin" <xin.li@intel.com>
Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b4ecc126

30 3月, 2009 6 次提交

x86/paravirt: use percpu_ rather than __get_cpu_var · ab2f75f0

由 Jeremy Fitzhardinge 提交于 2月 18, 2009

Impact: minor optimisation

percpu_read/write is a slightly more direct way of getting
to percpu data.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

ab2f75f0

x86/paravirt: allow preemption with lazy mmu mode · 2829b449

由 Jeremy Fitzhardinge 提交于 2月 17, 2009

Impact: remove obsolete checks, simplification

Lift restrictions on preemption with lazy mmu mode, as it is now allowed.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

2829b449

x86/paravirt: finish change from lazy cpu to context switch start/end · 224101ed

由 Jeremy Fitzhardinge 提交于 2月 18, 2009

Impact: fix lazy context switch API

Pass the previous and next tasks into the context switch start
end calls, so that the called functions can properly access the
task state (esp in end_context_switch, in which the next task
is not yet completely current).
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

224101ed

x86/paravirt: flush pending mmu updates on context switch · b407fc57

由 Jeremy Fitzhardinge 提交于 2月 17, 2009

Impact: allow preemption during lazy mmu updates

If we're in lazy mmu mode when context switching, leave
lazy mmu mode, but remember the task's state in
TIF_LAZY_MMU_UPDATES.  When we resume the task, check this
flag and re-enter lazy mmu mode if its set.

This sets things up for allowing lazy mmu mode while preemptible,
though that won't actually be active until the next change.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

b407fc57

x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switch · 7fd7d83d

由 Jeremy Fitzhardinge 提交于 2月 17, 2009

Impact: simplification, prepare for later changes

Make lazy cpu mode more specific to context switching, so that
it makes sense to do more context-switch specific things in
the callbacks.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

7fd7d83d

x86/paravirt: remove lazy mode in interrupts · b8bcfe99

由 Jeremy Fitzhardinge 提交于 2月 17, 2009

Impact: simplification, robustness

Make paravirt_lazy_mode() always return PARAVIRT_LAZY_NONE
when in an interrupt.  This prevents interrupt code from
accidentally inheriting an outer lazy state, and instead
does everything synchronously.  Outer batched operations
are left deferred.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>

b8bcfe99

19 3月, 2009 1 次提交

x86: with the last user gone, remove set_pte_present · 71ff49d7

由 Jeremy Fitzhardinge 提交于 3月 18, 2009

Impact: cleanup

set_pte_present() is no longer used, directly or indirectly,
so remove it.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
LKML-Reference: <1237406613-2929-2-git-send-email-jeremy@goop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71ff49d7

23 2月, 2009 1 次提交

x86: refactor x86_quirks support · 8e6dafd6

由 Ingo Molnar 提交于 2月 23, 2009

Impact: cleanup

Make x86_quirks support more transparent. The highlevel
methods are now named:

  extern void x86_quirk_pre_intr_init(void);
  extern void x86_quirk_intr_init(void);

  extern void x86_quirk_trap_init(void);

  extern void x86_quirk_pre_time_init(void);
  extern void x86_quirk_time_init(void);

This makes it clear that if some platform extension has to
do something here that it is considered ... weird, and is
discouraged.

Also remove arch_hooks.h and move it into setup.h (and other
header files where appropriate).
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8e6dafd6

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年