提交 · 72eb6a791459c87a0340318840bb3bd9252b627b · openanolis / cloud-kernel

05 1月, 2011 4 次提交

x86, NMI: Add touch_nmi_watchdog to io_check_error delay · 74d91e3c

由 Huang Ying 提交于 1月 04, 2011

Prevent the long delay in io_check_error making NMI watchdog
timeout.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

74d91e3c

x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time · 554ec063

由 Dongdong Deng 提交于 1月 04, 2011

The spin_lock_debug/rcu_cpu_stall detector uses
trigger_all_cpu_backtrace() to dump cpu backtrace.
Therefore it is possible that trigger_all_cpu_backtrace()
could be called at the same time on different CPUs, which
triggers and 'unknown reason NMI' warning. The following case
illustrates the problem:

      CPU1                    CPU2                     ...   CPU N
                       trigger_all_cpu_backtrace()
                       set "backtrace_mask" to cpu mask
                               |
generate NMI interrupts  generate NMI interrupts       ...
    \                          |                               /
     \                         |                              /

The "backtrace_mask" will be cleaned by the first NMI interrupt
at nmi_watchdog_tick(), then the following NMI interrupts
generated by other cpus's arch_trigger_all_cpu_backtrace() will
be taken as unknown reason NMI interrupts.

This patch uses a test_and_set to avoid the problem, and stop
the arch_trigger_all_cpu_backtrace() from calling to avoid
dumping a double cpu backtrace info when there is already a
trigger_all_cpu_backtrace() in progress.
Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
Reviewed-by: NBruce Ashfield <bruce.ashfield@windriver.com>
Cc: fweisbec@gmail.com
LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDon Zickus <dzickus@redhat.com>

554ec063

x86: Only call smp_processor_id in non-preempt cases · 9ab181fa

由 Don Zickus 提交于 1月 04, 2011

There are some paths that walk the die_chain with preemption on.
Make sure we are in an NMI call before we start doing anything.

This was triggered by do_general_protection calling notify_die
with DIE_GPF.
Reported-by: NJan Kiszka <jan.kiszka@web.de>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9ab181fa

x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion · cb2ded37

由 Yinghai Lu 提交于 1月 04, 2011

Found one x2apic pre-enabled system, x2apic_mode suddenly get
corrupted after register some cpus, when compiled
CONFIG_NR_CPUS=255 instead of 512.

It turns out that generic_processor_info() ==> phyid_set(apicid,
phys_cpu_present_map) causes the problem.

phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled
system some cpus have an apic id > 255.

The variable after phys_cpu_present_map may get corrupted
silently:

 ffffffff828e8420 B phys_cpu_present_map
 ffffffff828e8440 B apic_verbosity
 ffffffff828e8444 B local_apic_timer_c2_ok
 ffffffff828e8448 B disable_apic
 ffffffff828e844c B x2apic_mode
 ffffffff828e8450 B x2apic_disabled
 ffffffff828e8454 B num_processors
 ...

Actually phys_cpu_present_map is referenced via apic id, instead
index. We should use MAX_LOCAL_APIC instead MAX_APICS.

For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes
on 64-bit:

	text		data		bss		dec		filename
	21696943	4193748		12787712	38678403	vmlinux.before
	21696943	4193748		12791808	38682499	vmlinux.after

No change on 32bit.

Finally we can remove MAX_APCIS that was rather confusing.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4D23BD9C.3070102@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cb2ded37

04 1月, 2011 5 次提交

x86, mm: Initialize initial_page_table before paravirt jumps · d50d8fe1

由 Rusty Russell 提交于 1月 04, 2011

v2.6.36-rc8-54-gb40827fa (x86-32, mm: Add an initial page table
for core bootstrapping) made x86 boot using initial_page_table
and broke lguest.

For 2.6.37 we simply cut & paste the initialization code into
lguest (da32dac1 "lguest: populate initial_page_table"), now
we fix it properly by doing that initialization before the
paravirt jump.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
Cc: lguest <lguest@ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d50d8fe1

perf: Clean up power events by introducing new, more generic ones · 25e41933

由 Thomas Renninger 提交于 1月 03, 2011

Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>

25e41933

x86: udelay: Use this_cpu_read to avoid address calculation · 357089fc

由 Christoph Lameter 提交于 12月 16, 2010

The code will use a segment prefix instead of doing the lookup and
calculation.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

357089fc

x86, UV, BAU: Extend for more than 16 cpus per socket · cfa60917

由 Cliff Wickman 提交于 1月 03, 2011

Fix a hard-coded limit of a maximum of 16 cpu's per socket.

The UV Broadcast Assist Unit code initializes by scanning the
cpu topology of the system and assigning a master cpu for each
socket and UV hub. That scan had an assumption of a limit of 16
cpus per socket. With Westmere we are going over that limit.
The UV hub hardware will allow up to 32.

If the scan finds the system has gone over that limit it returns
an error and we print a warning and fall back to doing TLB
shootdowns without the BAU.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org> # .37.x
LKML-Reference: <E1PZol7-0000mM-77@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cfa60917

x86, hwmon: Add core threshold notification to therm_throt.c · 9e76a97e

由 R, Durgadoss 提交于 1月 03, 2011

This patch adds code to therm_throt.c to notify core thermal threshold
events. These thresholds are supported by the IA32_THERM_INTERRUPT register.
The status/log for the same is monitored using the IA32_THERM_STATUS register.
The necessary #defines are in msr-index.h. A call back is added to mce.h, to
further notify the thermal stack, about the threshold events.
Signed-off-by: NDurgadoss R <durgadoss.r@intel.com>
LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9e76a97e

03 1月, 2011 1 次提交

arch/x86/oprofile/op_model_amd.c: Perform initialisation on a single CPU · c7c25802

由 Robert Richter 提交于 1月 03, 2011

Disable preemption in init_ibs(). The function only checks the
ibs capabilities and sets up pci devices (if necessary). It runs
only on one cpu but operates with the local APIC and some MSRs,
thus it is better to disable preemption.

[    7.034377] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/483
[    7.034385] caller is setup_APIC_eilvt+0x155/0x180
[    7.034389] Pid: 483, comm: modprobe Not tainted 2.6.37-rc1-20101110+ #1
[    7.034392] Call Trace:
[    7.034400]  [<ffffffff812a2b72>] debug_smp_processor_id+0xd2/0xf0
[    7.034404]  [<ffffffff8101e985>] setup_APIC_eilvt+0x155/0x180
[ ... ]

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22812

Reported-by: <atswartz@gmail.com>
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Cc: oprofile-list@lists.sourceforge.net <oprofile-list@lists.sourceforge.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>         [2.6.37.x]
LKML-Reference: <20110103111514.GM4739@erda.amd.com>
[ small cleanups ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c7c25802

02 1月, 2011 1 次提交

KVM: i8259: initialize isr_ack · d0dfc6b7

由 Avi Kivity 提交于 12月 31, 2010

isr_ack is never initialized.  So, until the first PIC reset, interrupts
may fail to be injected.  This can cause Windows XP to fail to boot, as
reported in the fallout from the fix to
https://bugzilla.kernel.org/show_bug.cgi?id=21962.
Reported-and-tested-by: NNicolas Prochazka <prochazka.nicolas@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d0dfc6b7

30 12月, 2010 3 次提交

x86: Use this_cpu_inc_return for nmi counter · c1955b5f

由 Tejun Heo 提交于 12月 18, 2010

this_cpu_inc_return() saves us a memory access there.
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

c1955b5f

x86: Replace uses of current_cpu_data with this_cpu ops · 7b543a53

由 Tejun Heo 提交于 12月 18, 2010

Replace all uses of current_cpu_data with this_cpu operations on the
per cpu structure cpu_info.  The scala accesses are replaced with the
matching this_cpu ops which results in smaller and more efficient
code.

In the long run, it might be a good idea to remove cpu_data() macro
too and use per_cpu macro directly.

tj: updated description

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

7b543a53

x86: Use this_cpu_ops to optimize code · 0a3aee0d

由 Tejun Heo 提交于 12月 18, 2010

Go through x86 code and replace __get_cpu_var and get_cpu_var
instances that refer to a scalar and are not used for address
determinations.

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

0a3aee0d

29 12月, 2010 1 次提交

KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow · 649497d1

由 Avi Kivity 提交于 12月 28, 2010

We use the physical address instead of the base gfn for the four
PAE page directories we use in unpaged mode.  When the guest accesses
an address above 1GB that is backed by a large host page, a BUG_ON()
in kvm_mmu_set_gfn() triggers.

Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962Reported-and-tested-by: NNicolas Prochazka <prochazka.nicolas@gmail.com>
KVM-Stable-Tag.
Signed-off-by: NAvi Kivity <avi@redhat.com>

649497d1

28 12月, 2010 1 次提交

x86, paravirt: Use native_halt on a halt, not native_safe_halt · c8217b83

由 Cliff Wickman 提交于 12月 13, 2010

halt() should use native_halt()
safe_halt() uses native_safe_halt()

If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as

static inline void halt(void)
{
        PVOP_VCALL0(pv_irq_ops.safe_halt);
}

Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is

static inline void halt(void)
{
        native_halt();
}

So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt()
for a halt() is an oversight.
Am I missing something?

It probably hasn't shown up as a problem because the local apic is disabled
on a shutdown or restart.  But if we disable interrupts and call halt()
we shouldn't expect that the halt() will re-enable interrupts.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c8217b83

27 12月, 2010 1 次提交

x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree() · 5cdd2de0

由 Jesper Juhl 提交于 12月 25, 2010

In arch/x86/kernel/microcode_intel.c::generic_load_microcode()
we have  this:

	while (leftover) {
		...
		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
		    microcode_sanity_check(mc) < 0) {
			vfree(mc);
			break;
		}
		...
	}

	if (mc)
		vfree(mc);

This will cause a double free of 'mc'. This patch fixes that by
just  removing the vfree() call in the loop since 'mc' will be
freed nicely just  after we break out of the loop.

There's also a second change in the patch. I noticed a lot of
checks for  pointers being NULL before passing them to vfree().
That's completely  redundant since vfree() deals gracefully with
being passed a NULL pointer.  Removing the redundant checks
yields a nice size decrease for the object  file.

Size before the patch:
   text    data     bss     dec     hex filename
   4578     240    1032    5850    16da arch/x86/kernel/microcode_intel.o
Size after the patch:
   text    data     bss     dec     hex filename
   4489     240     984    5713    1651 arch/x86/kernel/microcode_intel.o
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Acked-by: NTigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cdd2de0

24 12月, 2010 2 次提交

x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation · d3bd0588

由 Yinghai Lu 提交于 12月 16, 2010

Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.

If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].

for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...

[    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[    9.235021] divide error: 0000 [#1] SMP
[    9.235315] last sysfs file:
[    9.235481] CPU 1
[    9.235592] Modules linked in:
[    9.245398]
[    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
[    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
...
[    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.665356]  RSP <ffff88103f8d1c40>
[    9.665568] ---[ end trace 2296156d35fdfc87 ]---

So let just parse all cpu entries in SRAT.

Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].

it fixes following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id=22662

-v2: expand to 32bit according to hpa
   need to add MAX_LOCAL_APIC for 32bit
Reported-and-Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Reported-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: NMyron Stowe <myron.stowe@hp.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD486.9020704@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d3bd0588

x86, acpi: Add MAX_LOCAL_APIC for 32bit · 56d91f13

由 Yinghai Lu 提交于 12月 16, 2010

We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number
of local apics.

Also apic_version[] array should use MAX_LOCAL_APICs.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD464.2020408@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

56d91f13

23 12月, 2010 1 次提交

x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR · 4a7863cc

由 Don Zickus 提交于 12月 22, 2010

The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c.  This shift has caused a whole bunch of
compile problems under different config options.  I attempt to
simplify things with the patch below.

In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR.  Basically they mean the same thing,
the former on a local level and the latter on a global level.

With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more.  x86 will now use the global
implementation.

The changes below do a few things.  First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here).  Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.

Second, I removed the x86 implementation of
touch_nmi_watchdog().  It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.

Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86.  And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.

Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR.  You can
only have one nmi_watchdog.

Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)

Hopefully, after this patch I won't get any more compile broken
emails. :-)

v3:
  changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
  prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4a7863cc

22 12月, 2010 2 次提交

x86, UV: Fix the effect of extra bits in the hub nodeid register · d8850ba4

由 Jack Steiner 提交于 11月 30, 2010

UV systems can be partitioned into multiple independent SSIs.
Large partitioned systems may have extra bits in the node_id
register. These bits are used when the total memory on all SSIs
exceeds 16TB.  These extra bits need to be ignored when
calculating x2apic_extra_bits.
Signed-off-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.972776133@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d8850ba4

x86, UV: Add common uv_early_read_mmr() function for reading MMRs · e6810413

由 Jack Steiner 提交于 11月 30, 2010

Early in boot, reading MMRs from the UV hub controller require
calls to early_ioremap()/early_iounmap().  Rather than
duplicating code, add a common function to do the
map/read/unmap.
Signed-off-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.834804371@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e6810413

19 12月, 2010 2 次提交

oprofile, x86: Add support for 6 counters (AMD family 15h) · da169f5d

由 Robert Richter 提交于 9月 24, 2010

This patch adds support for up to 6 hardware counters for AMD family
15h cpus. There is a new MSR range for hardware counters beginning at
MSRC001_0200 Performance Event Select (PERF_CTL0).
Signed-off-by: NRobert Richter <robert.richter@amd.com>

da169f5d

oprofile, x86: Add support for AMD family 15h · 30570bce

由 Robert Richter 提交于 8月 31, 2010

This patch adds support for AMD family 15h (Interlagos/Valencia/
Zambezi) cpus.
Signed-off-by: NRobert Richter <robert.richter@amd.com>

30570bce

18 12月, 2010 8 次提交

cpuops: Use cmpxchg for xchg to avoid lock semantics · 8270137a

由 Christoph Lameter 提交于 12月 14, 2010

Use cmpxchg instead of xchg to realize this_cpu_xchg.

xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
will not.

Baselines:

xchg()		= 18 cycles (no segment prefix, LOCK semantics)
__this_cpu_xchg = 1 cycle

(simulated using this_cpu_read/write, two prefixes. Looks like the
cpu can use loop optimization to get rid of most of the overhead)

Cycles before:

this_cpu_xchg	 = 37 cycles (segment prefix and LOCK (implied by xchg))

After:

this_cpu_xchg	= 11 cycle (using cmpxchg without lock semantics)
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

8270137a

x86: this_cpu_cmpxchg and this_cpu_xchg operations · 7296e08a

由 Christoph Lameter 提交于 12月 14, 2010

Provide support as far as the hardware capabilities of the x86 cpus
allow.

Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for
fast cpuops implementations.

V1->V2:
	- Take out the definition for this_cpu_cmpxchg_8 and move it into
	  a separate patch.

tj: - Reordered ops to better follow this_cpu_* organization.
    - Renamed macro temp variables similar to their existing
      neighbours.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

7296e08a

x86, kexec: Limit the crashkernel address appropriately · 7f8595bf

由 H. Peter Anvin 提交于 12月 16, 2010

Keep the crash kernel address below 512 MiB for 32 bits and 896 MiB
for 64 bits.  For 32 bits, this retains compatibility with earlier
kernel releases, and makes it work even if the vmalloc= setting is
adjusted.

For 64 bits, we should be able to increase this substantially once a
hard-coded limit in kexec-tools is fixed.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101217195035.GE14502@redhat.com>

7f8595bf

x86: avoid high BIOS area when allocating address space · a2c606d5

由 Bjorn Helgaas 提交于 12月 16, 2010

This prevents allocation of the last 2MB before 4GB.

The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27

This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

a2c606d5

x86: avoid E820 regions when allocating address space · 4dc2287c

由 Bjorn Helgaas 提交于 12月 16, 2010

When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.

On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS. On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,

BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]

If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory. This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.

I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below). That means we're vulnerable to BIOS defects that Windows would not
trip over. For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

4dc2287c

x86: avoid low BIOS area when allocating address space · 30919b0b

由 Bjorn Helgaas 提交于 12月 16, 2010

This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas.  This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.

We previously avoided this area in pcibios_align_resource().  This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

30919b0b

Revert "x86/PCI: allocate space from the end of a region, not the beginning" · d14125ec

由 Bjorn Helgaas 提交于 12月 16, 2010

This reverts commit dc9887dc.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

d14125ec

Revert "x86: allocate space within a region top-down" · 5e52f1c5

由 Bjorn Helgaas 提交于 12月 16, 2010

This reverts commit 1af3c2e4.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

5e52f1c5

17 12月, 2010 5 次提交

percpu,x86: relocate this_cpu_add_return() and friends · 40304775

由 Tejun Heo 提交于 12月 17, 2010

- include/linux/percpu.h: this_cpu_add_return() and friends were
  located next to __this_cpu_add_return().  However, the overall
  organization is to first group by preemption safeness.  Relocate
  this_cpu_add_return() and friends to preemption-safe area.

- arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after
  other more basic operations.  Relocate [__]this_cpu_add_return_8()
  so that they're first grouped by preemption safeness.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>

40304775

x86: Support for this_cpu_add, sub, dec, inc_return · 8f1d97c7

由 Christoph Lameter 提交于 12月 06, 2010

Supply an implementation for x86 in order to generate more efficient code.

V2->V3:
	- Cleanup
	- Remove strange type checking from percpu_add_return_op.

tj: - Dropped unused typedef from percpu_add_return_op().
    - Renamed ret__ to paro_ret__ in percpu_add_return_op().
    - Minor indentation adjustments.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

8f1d97c7

xen: Use this_cpu_ops · 780f36d8

由 Christoph Lameter 提交于 12月 06, 2010

Use this_cpu_ops to reduce code size and simplify things in various places.

V3->V4:
	Move instance of this_cpu_inc_return to a later patchset so that
	this patch can be applied without infrastructure changes.

Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

780f36d8

kprobes: Use this_cpu_ops · b76834bc

由 Christoph Lameter 提交于 12月 06, 2010

Use this_cpu ops in various places to optimize per cpu data access.

Cc: Jason Baron <jbaron@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

b76834bc

x86-32: Make sure we can map all of lowmem if we need to · 147dd561

由 H. Peter Anvin 提交于 12月 16, 2010

A relocatable kernel can be anywhere in lowmem -- and in the case of a
kdump kernel, is likely to be fairly high.  Since the early page
tables map everything from address zero up we need to make sure we
allocate enough brk that we can map all of lowmem if we need to.
Reported-by: NStanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Tested-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD3ED.8070607@kernel.org>

147dd561

16 12月, 2010 3 次提交

A
KVM: Fix preemption counter leak in kvm_timer_init() · 3e26f230
由 Avi Kivity 提交于 12月 16, 2010
```
Based on a patch from Thomas Meyer.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
3e26f230

perf, x86: Provide a PEBS capable cycle event · 7639dae0

由 Peter Zijlstra 提交于 12月 14, 2010

Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7639dae0

perf: Dynamic pmu types · 2e80a82a

由 Peter Zijlstra 提交于 11月 17, 2010

Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.

Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.

If we want to enumerate the PMUs they need a name, provide one.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.259707703@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e80a82a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功