提交 · 816afe4ff98ee10b1d30fd66361be132a0a5cee6 · openanolis / cloud-kernel

23 8月, 2012 1 次提交

x86/smp: Don't ever patch back to UP if we unplug cpus · 816afe4f

由 Rusty Russell 提交于 8月 06, 2012

We still patch SMP instructions to UP variants if we boot with a
single CPU, but not at any other time.  In particular, not if we
unplug CPUs to return to a single cpu.

Paul McKenney points out:

 mean offline overhead is 6251/48=130.2 milliseconds.

 If I remove the alternatives_smp_switch() from the offline
 path [...] the mean offline overhead is 550/42=13.1 milliseconds

Basically, we're never going to get those 120ms back, and the
code is pretty messy.

We get rid of:

 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
    documentation is wrong. It's now the default.

 2) The skip_smp_alternatives flag used by suspend.

 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
    which were only used to set this one flag.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paul.mckenney@us.ibm.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.auSigned-off-by: NIngo Molnar <mingo@kernel.org>

816afe4f

01 8月, 2012 3 次提交

x86: OLPC: switch over to using new EC driver on x86 · 85f90cf6

由 Andres Salomon 提交于 7月 12, 2012

This uses the new EC driver framework in drivers/platform/olpc.  The
XO-1 and XO-1.5-specific code is still in arch/x86, but the generic stuff
(including a new workqueue; no more running EC commands with IRQs disabled!)
can be shared with other architectures.
Signed-off-by: NAndres Salomon <dilinger@queued.net>
Acked-by: NPaul Fox <pgf@laptop.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

85f90cf6

drivers: OLPC: update various drivers to include olpc-ec.h · 3bf9428f

由 Andres Salomon 提交于 7月 11, 2012

Switch over to using olpc-ec.h in multiple steps, so as not to break builds.
This covers every driver that calls olpc_ec_cmd().
Signed-off-by: NAndres Salomon <dilinger@queued.net>
Acked-by: NPaul Fox <pgf@laptop.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

3bf9428f

Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver · 392a325c

由 Andres Salomon 提交于 7月 10, 2012

The OLPC EC driver has outgrown arch/x86/platform/.  It's time to both
share common code amongst different architectures, as well as move it out
of arch/x86/.  The XO-1.75 is ARM-based, and the EC driver shares a lot of
code with the x86 code.
Signed-off-by: NAndres Salomon <dilinger@queued.net>
Acked-by: NPaul Fox <pgf@laptop.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

392a325c

31 7月, 2012 2 次提交

perf/x86: Fix USER/KERNEL tagging of samples properly · d07bdfd3

由 Peter Zijlstra 提交于 7月 10, 2012

Some PMUs don't provide a full register set for their sample,
specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide
'better' than regular interrupt accuracy.

In this case we use the interrupt regs as basis and over-write some
fields (typically IP) with different information.

The perf core however uses user_mode() to distinguish user/kernel
samples, user_mode() relies on regs->cs. If the interrupt skid pushed
us over a boundary the new IP might not be in the same domain as the
interrupt.

Commit ce5c1fe9 ("perf/x86: Fix USER/KERNEL tagging of samples")
tried to fix this by making the perf core use kernel_ip(). This
however is wrong (TM), as pointed out by Linus, since it doesn't allow
for VM86 and non-zero based segments in IA32 mode.

Therefore, provide a new helper to set the regs->ip field,
set_linear_ip(), which massages the regs into a suitable state
assuming the provided IP is in fact a linear address.

Also modify perf_instruction_pointer() and perf_callchain_user() to
deal with segments base offsets.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

d07bdfd3

ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION · c1d7e01d

由 Will Deacon 提交于 7月 30, 2012

Rather than #define the options manually in the architecture code, add
Kconfig options for them and select them there instead.  This also allows
us to select the compat IPC version parsing automatically for platforms
using the old compat IPC interface.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1d7e01d

26 7月, 2012 2 次提交

x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h> · 736edce5

由 Tony Luck 提交于 7月 19, 2012

We will need some of these values in mce.c. Move them to the
appropriate header file so they are available.
Acked-by: NBorislav Petkov <bp@amd64.org>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

736edce5

perf/x86: Fix missing struct before structure name · 35d56ca9

由 Jovi Zhang 提交于 7月 17, 2012

When CONFIG_PERF_EVENTS disabled, there will have a compiliation
error, because missing struct before structure name.
Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/CACV3sbKF%3DCX%2B2jWEWesfCA6rBoQ3wDM4-5ac9MuBtVbCtMRHdQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

35d56ca9

21 7月, 2012 4 次提交

x86, efi: Handover Protocol · 9ca8f72a

由 Matt Fleming 提交于 7月 19, 2012

As things currently stand, traditional EFI boot loaders and the EFI
boot stub are carrying essentially the same initialisation code
required to setup an EFI machine for booting a kernel. There's really
no need to have this code in two places and the hope is that, with
this new protocol, initialisation and booting of the kernel can be
left solely to the kernel's EFI boot stub. The responsibilities of the
boot loader then become,

   o Loading the kernel image from boot media

File system code still needs to be carried by boot loaders for the
scenario where the kernel and initrd files reside on a file system
that the EFI firmware doesn't natively understand, such as ext4, etc.

   o Providing a user interface

Boot loaders still need to display any menus/interfaces, for example
to allow the user to select from a list of kernels.

Bump the boot protocol number because we added the 'handover_offset'
field to indicate the location of the handover protocol entry point.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
Acked-and-Tested-by: NMatthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1342689828-16815-1-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

9ca8f72a

x86/tlb: Fix build warning and crash when building for !SMP · 7efa1c87

由 Alex Shi 提交于 7月 20, 2012

The incompatible parameter of flush_tlb_mm_range cause build warning.
Fix it by correct parameter.

Ingo Molnar found that this could also cause a user space crash.
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

7efa1c87

x86, cpufeature: Add the RDSEED and ADX features · 30d5c454

由 H. Peter Anvin 提交于 7月 20, 2012

Add the RDSEED and ADX features documented in section 9.1 of the Intel
Architecture Instruction Set Extensions Programming Reference,
document 319433, version 013b, available from
http://software.intel.com/en-us/avx/

The PREFETCHW bit is already supported in Linux under the name
3DNOWPREFETCH.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-lgr6482ufk1bvxzvc2hr8qbp@git.kernel.org

30d5c454

KVM: fix race with level interrupts · 1a577b72

由 Michael S. Tsirkin 提交于 7月 19, 2012

When more than 1 source id is in use for the same GSI, we have the
following race related to handling irq_states race:

CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1.
CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0).
Now ioapic thinks the level is 0 but irq_state is not 0.

Fix by performing all irq_states bitmap handling under pic/ioapic lock.
This also removes the need for atomics with irq_states handling.
Reported-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1a577b72

20 7月, 2012 1 次提交

xen/mce: Add mcelog support for Xen platform · cef12ee5

由 Liu, Jinsong 提交于 6月 07, 2012

When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.

This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.

By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.

To test follow directions outlined in Documentation/acpi/apei/einj.txt
Acked-and-tested-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NKe, Liping <liping.ke@intel.com>
Signed-off-by: NJiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cef12ee5

16 7月, 2012 2 次提交

Revert "apic: fix kvm build on UP without IOAPIC" · ebf7d2e9

由 Michael S. Tsirkin 提交于 7月 15, 2012

This reverts commit f9808b7f.
After commit 'kvm: switch to apic_set_eoi_write, apic_write'
the stubs are no longer needed as kvm does not look at apicdrivers anymore.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ebf7d2e9

apic: add apic_set_eoi_write for PV use · 1551df64

由 Michael S. Tsirkin 提交于 7月 15, 2012

KVM PV EOI optimization overrides eoi_write apic op with its own
version. Add an API for this to avoid meddling with core x86 apic driver
data structures directly.

For KVM use, we don't need any guarantees about when the switch to the
new op will take place, so it could in theory use this API after SMP init,
but it currently doesn't, and restricting callers to early init makes it
clear that it's safe as it won't race with actual APIC driver use.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1551df64

12 7月, 2012 2 次提交

KVM: VMX: Implement PCID/INVPCID for guests with EPT · ad756a16

由 Mao, Junjie 提交于 7月 02, 2012

This patch handles PCID/INVPCID for guests.

Process-context identifiers (PCIDs) are a facility by which a logical processor
may cache information for multiple linear-address spaces so that the processor
may retain cached information when software switches to a different linear
address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
Volume 3A for details.

For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
natively.
For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ad756a16

KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check · fc73373b

由 Prarit Bhargava 提交于 7月 06, 2012

While debugging I noticed that unlike all the other hypervisor code in the
kernel, kvm does not have an entry for x86_hyper which is used in
detect_hypervisor_platform() which results in a nice printk in the
syslog.  This is only really a stub function but it
does make kvm more consistent with the other hypervisors.
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marcelo Tostatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

fc73373b

09 7月, 2012 2 次提交

KVM: x86 emulator: initialize memop · cbd27ee7

由 Avi Kivity 提交于 6月 10, 2012

memop is not initialized; this can lead to a two-byte operation
following a 4-byte operation to see garbage values.  Usually
truncation fixes things fot us later on, but at least in one case
(call abs) it doesn't.

Fix by moving memop to the auto-initialized field area.
Signed-off-by: NAvi Kivity <avi@redhat.com>

cbd27ee7

KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semantics · 0017f93a

由 Avi Kivity 提交于 6月 07, 2012

Instead of getting an exact leaf, follow the spec and fall back to the last
main leaf instead.  This lets us easily emulate the cpuid instruction in the
emulator.
Signed-off-by: NAvi Kivity <avi@redhat.com>

0017f93a

06 7月, 2012 5 次提交

x86/apic/x2apic: Limit the vector reservation to the user specified mask · 1ac322d0

由 Suresh Siddha 提交于 6月 25, 2012

For the x2apic cluster mode, vector for an interrupt is
currently reserved on all the cpu's that are part of the x2apic
cluster. But the interrupts will be routed only to the cluster
(derived from the first cpu in the mask) members specified in
the mask. So there is no need to reserve the vector in the
unused cluster members.

Modify __assign_irq_vector() to reserve the vectors based on the
user specified irq destination mask. If the new mask is a proper
subset of the currently used mask, cleanup the vector allocation
on the unused cpu members.

Also, allow the apic driver to tune the vector domain based on
the affinity mask (which in most cases is the user-specified
mask).
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NAlexander Gordeev <agordeev@redhat.com>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/1340656709-11423-3-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1ac322d0

x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership · b39f25a8

由 Suresh Siddha 提交于 6月 25, 2012

Currently __assign_irq_vector() goes through each cpu in the
specified mask until it finds a free vector in all the cpu's
that are part of the same interrupt domain. We visit all the
interrupt domain sibling cpus to reserve the free vector. So,
when we fail to find a free vector in an interrupt domain, it is
safe to continue our search with a cpu belonging to a new
interrupt domain. No need to go through each cpu, if the domain
containing that cpu is already visited.

Use the irq_cfg's old_domain to track the visited domains and
optimize the cpu traversal while finding a free vector in the
given cpumask.

NOTE: We can also optimize the search by using for_each_cpu() and
skip the current cpu, if it is not the first cpu in the mask
returned by the vector_allocation_domain(). But re-using the
cfg->old_domain to track the visited domains will be slightly
faster.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NAlexander Gordeev <agordeev@redhat.com>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b39f25a8

perf/x86: Add a microcode revision check for SNB-PEBS · c93dc84c

由 Peter Zijlstra 提交于 6月 08, 2012

Recent Intel microcode resolved the SNB-PEBS issues, so conditionally
enable PEBS on SNB hardware depending on the microcode revision.

Thanks to Stephane for figuring out the various microcode revisions.
Suggested-by: NStephane Eranian <eranian@google.com>
Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-v3672ziwh9damwqwh1uz3krm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c93dc84c

perf/x86/amd: Unify AMD's generic and family 15h pmus · b1dc3c48

由 Robert Richter 提交于 6月 20, 2012

There is no need for keeping separate pmu structs. We can enable
amd_{get,put}_event_constraints() functions also for family 15h event.

The advantage is that there is only a single pmu struct for all AMD
cpus. This patch introduces functions to setup the pmu to enabe core
performance counters or counter constraints.

Also, cpuid checks are used instead of family checks where
possible. Thus, it enables the code independently of cpu families if
the feature flag is set.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b1dc3c48

perf/x86: Rename Intel specific macros · 15c7ad51

由 Robert Richter 提交于 6月 20, 2012

There are macros that are Intel specific and not x86 generic. Rename
them into INTEL_*.

This patch removes X86_PMC_IDX_GENERIC and does:

 $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g'           \
         arch/x86/include/asm/kvm_host.h                 \
         arch/x86/include/asm/perf_event.h               \
         arch/x86/kernel/cpu/perf_event.c                \
         arch/x86/kernel/cpu/perf_event_p4.c             \
         arch/x86/kvm/pmu.c
 $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \
         arch/x86/include/asm/perf_event.h               \
         arch/x86/kernel/cpu/perf_event.c                \
         arch/x86/kernel/cpu/perf_event_intel.c          \
         arch/x86/kernel/cpu/perf_event_intel_ds.c       \
         arch/x86/kvm/pmu.c
 $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g'           \
         arch/x86/include/asm/perf_event.h               \
         arch/x86/kernel/cpu/perf_event.c
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

15c7ad51

04 7月, 2012 1 次提交

apic: fix kvm build on UP without IOAPIC · f9808b7f

由 Michael S. Tsirkin 提交于 7月 01, 2012

On UP i386, when APIC is disabled
# CONFIG_X86_UP_APIC is not set
# CONFIG_PCI_IOAPIC is not set

code looking at apicdrivers never has any effect but it
still gets compiled in. In particular, this causes
build failures with kvm, but it generally bloats the kernel
unnecessarily.

Fix by defining both __apicdrivers and __apicdrivers_end
to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
that as the result any loop scanning __apicdrivers gets optimized out by
the compiler.

Warning: a .config with apic disabled doesn't seem to boot
for me (even without this patch). Still verifying why,
meanwhile this patch is compile-tested only.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f9808b7f

30 6月, 2012 1 次提交

x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature · 954e482b

由 Fenghua Yu 提交于 5月 24, 2012

According to Intel 64 and IA-32 SDM and Optimization Reference Manual, beginning
with Ivybridge, REG string operation using MOVSB and STOSB can provide both
flexible and high-performance REG string operations in cases like memory copy.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

If CPU erms feature is detected, patch copy_user_generic with enhanced fast
string version of copy_user_generic.

A few new macros are defined to reduce duplicate code in ALTERNATIVE and
ALTERNATIVE_2.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1337908785-14015-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

954e482b

28 6月, 2012 6 次提交

x86/tlb: do flush_tlb_kernel_range by 'invlpg' · effee4b9

由 Alex Shi 提交于 6月 28, 2012

This patch do flush_tlb_kernel_range by 'invlpg'. The performance pay
and gain was analyzed in previous patch
(x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range).

In the testing: http://lkml.org/lkml/2012/6/21/10

The pay is mostly covered by long kernel path, but the gain is still
quite clear, memory access in user APP can increase 30+% when kernel
execute this funtion.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-10-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

effee4b9

x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR · 52aec330

由 Alex Shi 提交于 6月 28, 2012

There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big
amount of vector in IDT. But it is still not enough, since modern x86
sever has more cpu number. That still causes heavy lock contention
in TLB flushing.

The patch using generic smp call function to replace it. That saved 32
vector number in IDT, and resolved the lock contention in TLB
flushing on large system.

In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread
has 3% performance increase.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

52aec330

x86/tlb: enable tlb flush range support for x86 · 611ae8e3

由 Alex Shi 提交于 6月 28, 2012

Not every tlb_flush execution moment is really need to evacuate all
TLB entries, like in munmap, just few 'invlpg' is better for whole
process performance, since it leaves most of TLB entries for later
accessing.

This patch also rewrite flush_tlb_range for 2 purposes:
1, split it out to get flush_blt_mm_range function.
2, clean up to reduce line breaking, thanks for Borislav's input.

My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59
show that the random memory access on other CPU has 0~50% speed up
on a 2P * 4cores * HT NHM EP while do 'munmap'.

Thanks Yongjie's testing on this patch:
-------------
I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest
kernel.
After running two benchmarks in Xen HVM guest, I found his patches
brought about 1%~3% performance gain in 'kernel build' and 'netperf'
testing, though the performance gain was not very stable in 'kernel
build' testing.

Some detailed testing results are below.

Testing Environment:
	Hardware: Romley-EP platform
	Xen version: latest upstream
	Linux kernel: 3.4-RC6
	Guest vCPU number: 8
	NIC: Intel 82599 (10GB bandwidth)

In 'kernel build' testing in guest:
	Command line  |  performance gain
    make -j 4      |    3.81%
    make -j 8      |    0.37%
    make -j 16     |    -0.52%

In 'netperf' testing, we tested TCP_STREAM with default socket size
16384 byte as large packet and 64 byte as small packet.
I used several clients to add networking pressure, then 'netperf' server
automatically generated several threads to response them.
I also used large-size packet and small-size packet in the testing.
	Packet size  |  Thread number | performance gain
	16384 bytes  |      4       |   0.02%
	16384 bytes  |      8       |   2.21%
	16384 bytes  |      16      |   2.04%
	64 bytes     |      4       |   1.07%
	64 bytes     |      8       |   3.31%
	64 bytes     |      16      |   0.71%
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.comTested-by: NRen, Yongjie <yongjie.ren@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

611ae8e3

x86/tlb: add tlb_flushall_shift for specific CPU · c4211f42

由 Alex Shi 提交于 6月 28, 2012

Testing show different CPU type(micro architectures and NUMA mode) has
different balance points between the TLB flush all and multiple invlpg.
And there also has cases the tlb flush change has no any help.

This patch give a interface to let x86 vendor developers have a chance
to set different shift for different CPU type.

like some machine in my hands, balance points is 16 entries on
Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
help.

For untested machine, do a conservative optimization, same as NHM CPU.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

c4211f42

x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range · e7b52ffd

由 Alex Shi 提交于 6月 28, 2012

x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.

But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.

So, on a 512 4KB TLB entries CPU, the balance points is at:
	(512 - X) * 100ns(assumed TLB refill cost) =
		X(TLB flush entries) * 100ns(assumed invlpg cost)

Here, X is 256, that is 1/2 of 512 entries.

But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.

As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.

My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.

Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':

multi thread testing, '-t' paramter is thread number:
	       	        with patch   unpatched 3.4-rc4
./mprotect -t 1           14ns		24ns
./mprotect -t 2           13ns		22ns
./mprotect -t 4           12ns		19ns
./mprotect -t 8           14ns		16ns
./mprotect -t 16          28ns		26ns
./mprotect -t 32          54ns		51ns
./mprotect -t 128         200ns		199ns

Single process with sequencial flushing and memory accessing:

		       	with patch   unpatched 3.4-rc4
./mprotect		    7ns			11ns
./mprotect -p 4096  -l 8 -n 10240
			    21ns		21ns

[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
  has additional performance numbers. ]
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

e7b52ffd

x86/tlb_info: get last level TLB entry number of CPU · e0ba94f1

由 Alex Shi 提交于 6月 28, 2012

For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
instruction TLB, second level is shared TLB for both data and instructions.

For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
and 1GB.

Although each levels TLB size is important for performance tuning, but for
genernal and rude optimizing, last level TLB entry number is suitable. And
in fact, last level TLB always has the biggest entry number.

This patch will get the biggest TLB entry number and use it in furture TLB
optimizing.

Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
function and data will be released after system boot up.

For all kinds of x86 vendor friendly, vendor specific code was moved to its
specific files.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

e0ba94f1

27 6月, 2012 6 次提交

crypto: move arch/x86/include/asm/aes.h to arch/x86/include/asm/crypto/ · 70ef2601

由 Jussi Kivilinna 提交于 6月 18, 2012

Move AES header to the new asm/crypto directory.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

70ef2601

crypto: move arch/x86/include/asm/serpent-{sse2|avx}.h to arch/x86/include/asm/crypto/ · d4af0e9d

由 Jussi Kivilinna 提交于 6月 18, 2012

Move serpent crypto headers to the new asm/crypto/ directory.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d4af0e9d

crypto: twofish-avx - remove duplicated glue code and use shared glue code from glue_helper · a7378d4e

由 Jussi Kivilinna 提交于 6月 18, 2012

Now that shared glue code is available, convert twofish-avx to use it.

Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a7378d4e

crypto: serpent-sse2 - split generic glue code to new helper module · 596d8750

由 Jussi Kivilinna 提交于 6月 18, 2012

Now that serpent-sse2 glue code has been made generic, it can be split to
separate module.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

596d8750

crypto: aes_ni - change to use shared ablk_* functions · a9629d71

由 Jussi Kivilinna 提交于 6月 18, 2012

Remove duplicate ablk_* functions and make use of ablk_helper module instead.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a9629d71

crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code to shared module · ffaf9156

由 Jussi Kivilinna 提交于 6月 18, 2012

Move ablk-* functions to separate module to share common code between cipher
implementations.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ffaf9156

26 6月, 2012 1 次提交

x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM · 4ad33411

由 H. Peter Anvin 提交于 6月 22, 2012

It makes sense to label "Digital Thermal Sensor" as "DTS", but
unfortunately the string "dts" was already used for "Debug Store", and
/proc/cpuinfo is a user space ABI.

Therefore, rename this to "dtherm".

This conflict went into mainline via the hwmon tree without any x86
maintainer ack, and without any kind of hint in the subject.

    a4659053 x86/hwmon: fix initialization of coretemp
Reported-by: NJean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: <stable@vger.kernel.org> v2.6.36..v3.4
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4ad33411

25 6月, 2012 1 次提交

x86/uv: Work around UV2 BAU hangs · 8b6e511e

由 Cliff Wickman 提交于 6月 22, 2012

On SGI's UV2 the BAU (Broadcast Assist Unit) driver can hang
under a heavy load. To cure this:

- Disable the UV2 extended status mode (see UV2_EXT_SHFT), as
  this mode changes BAU behavior in more ways then just delivering
  an extra bit of status.  Revert status to just two meaningful bits,
  like UV1.

- Use no IPI-style resets on UV2.  Just give up the request for
  whatever the reason it failed and let it be accomplished with
  the legacy IPI method.

- Use no alternate sending descriptor (the former UV2 workaround
  bcp->using_desc and handle_uv2_busy() stuff).  Just disable the
  use of the BAU for a period of time in favor of the legacy IPI
  method when the h/w bug leaves a descriptor busy.

  -- new tunable: giveup_limit determines the threshold at which a hub is
     so plugged that it should do all requests with the legacy IPI method for a
     period of time
  -- generalize disable_for_congestion() (renamed disable_for_period()) for
     use whenever a hub should avoid using the BAU for a period of time

Also:

 - Fix find_another_by_swack(), which is part of the UV2 bug workaround

 - Correct and clarify the statistics (new stats s_overipilimit, s_giveuplimit,
   s_enters, s_ipifordisabled, s_plugged, s_congested)
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131459.GC31884@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8b6e511e

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功