提交 · 2a475501b81f06f64c474cfad66f8807294b4534 · openeuler / raspberrypi-kernel

03 9月, 2013 1 次提交

lockref: implement lockless reference count updates using cmpxchg() · bc08b449

由 Linus Torvalds 提交于 9月 02, 2013

Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop.  This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.

Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.

So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.

The lockref structure, in contrast, really is a *locked* reference
count.  If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.

In order to enable the cmpxchg lockless code, the architecture needs to
do three things:

 (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
     in an aligned u64, and have a "cmpxchg()" implementation that works
     on such a u64 data type.

 (2) define a helper function to test for a spinlock being unlocked
     ("arch_spin_value_unlocked()")

 (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
     Kconfig file.

This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc08b449

02 9月, 2013 3 次提交

perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node() · 7bfb7e6b

由 Joe Perches 提交于 8月 29, 2013

Use the convenience function instead of __GFP_ZERO.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f58599ae1a8d7b32d37e9cf283e95fba6452f7f6.1377809875.git.joe@perches.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7bfb7e6b

perf/x86: Add Silvermont (22nm Atom) support · 1fa64180

由 Yan, Zheng 提交于 7月 18, 2013

Compared to old atom, Silvermont has offcore and has more events
that support PEBS.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1fa64180

perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X · 53ad0447

由 Yan, Zheng 提交于 7月 18, 2013

Silvermont (22nm Atom) has two offcore response configuration MSRs,
unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7.

To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to
define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code
for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

53ad0447

30 8月, 2013 3 次提交

x86, doc: Update uaccess.h comment to reflect clang changes · f69fa9a9

由 H. Peter Anvin 提交于 8月 29, 2013

Update comment in uaccess.h to reflect the changes for clang support:
gcc only cares about the base register (most architectures don't
encode the size of the operation in the operands like x86 does, and so
it is treated effectively like a register number), whereas clang tries
to enforce the size -- but not for register pairs.

Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Jan-Simon Möller <dl9pf@gmx.de>

f69fa9a9

x86, asm: Fix a compilation issue with clang · bdfc017e

由 Jan-Simon Möller 提交于 8月 29, 2013

Clang does not support the "shortcut" we're taking here for gcc (see below).
The patch uses the macro _ASM_DX to do the job.

From arch/x86/include/asm/uaccess.h:
/*
 * Careful: we have to cast the result to the type of the pointer
 * for sign reasons.
 *
 * The use of %edx as the register specifier is a bit of a
 * simplification, as gcc only cares about it as the starting point
 * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
 * (%ecx being the next register in gcc's x86 register sequence), and
 * %rdx on 64 bits.
 */

[ hpa: I consider this a compatibility bug in clang as this reflects a
  bit of a misunderstanding about how register strings are used by
  gcc, but the workaround is straightforward and there is no
  particular reason to not do it. ]
Signed-off-by: NJan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

bdfc017e

x86, asm: Extend definitions of _ASM_* with a raw format · 3e9b2327

由 Jan-Simon Möller 提交于 8月 29, 2013

The __ASM_* macros (e.g. __ASM_DX) are used to return the proper
register name (e.g. edx for 32bit / rdx for 64bit). We want to use
this also in arch/x86/include/asm/uaccess.h / get_user() . For this
to work, we need a raw form as both gcc and clang choke on the
whitespace in a register asm() statement, and the __ASM_FORM macro
surrounds the argument with blanks. A new macro, __ASM_FORM_RAW was
added and we change __ASM_REG to use the new RAW form.
Signed-off-by: NJan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-2-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

3e9b2327

26 8月, 2013 1 次提交

x86/ioapic: Check attr against the previous setting when programmed more than once · 25aa2957

由 Liu Ping Fan 提交于 8月 23, 2013

When programming ioapic pinX more than once, current code
does not check whether the later attr (trigger & polarity) is the
same as the former or not.

This causes broken semantics which can be observed in a qemu q35
machine, where ioapic's ioredtbl[x] can never be set as low-active,
even if the hpet driver registered it.

And hpet driver may share a high-level active IRQ line with other
devices. So in qemu, when hpet-dev asserts low-level as kernel
expects, the kernel has no response.

With this patch, we can observe an ioredtbl[x] set as low-active
for hpet.

Fix it by reporting -EBUSY to the caller, when attr is different.
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1377248327-19633-1-git-send-email-pingfank@linux.vnet.ibm.com
[ Made small readability edits to both the changelog and the code. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

25aa2957

23 8月, 2013 4 次提交

Kconfig: Remove hotplug enable hints in CONFIG_KEXEC help texts · bf220695

由 Geert Uytterhoeven 提交于 8月 20, 2013

commit 40b31360 ("Finally eradicate
CONFIG_HOTPLUG") removed remaining references to CONFIG_HOTPLUG, but missed
a few plain English references in the CONFIG_KEXEC help texts.

Remove them, too.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

bf220695

x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member · 41aacc1e

由 Radu Caragea 提交于 8月 21, 2013

This is the updated version of df54d6fa ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.
Signed-off-by: NRadu Caragea <sinaelgl@gmail.com>
Reported-and-tested-by: NJeff Shorey <shoreyjeff@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41aacc1e

Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction" · 5ea80f76

由 Linus Torvalds 提交于 8月 22, 2013

This reverts commit df54d6fa.

The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.

In particular, the MATLAB Java runtime seems to be unhappy. See

  https://bugzilla.kernel.org/show_bug.cgi?id=60774

So we'll want to apply the random offset only once, and Radu has a patch
for that.  Revert this older commit in order to apply the other one.
Reported-by: NJeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ea80f76

PCI: Simplify pcie_bus_configure_settings() interface · a58674ff

由 Bjorn Helgaas 提交于 8月 22, 2013

Based on a patch by Jon Mason (see URL below).

All users of pcie_bus_configure_settings() pass arguments of the form
"bus, bus->self->pcie_mpss". The "mpss" argument is redundant since we
can easily look it up internally. In addition, all callers check
"bus->self" for NULL, which we can also do internally.

This patch simplifies the interface and the callers. No functional change.

Reference: http://lkml.kernel.org/r/1317048850-30728-2-git-send-email-mason@myri.comSigned-off-by: NBjorn Helgaas <bhelgaas@google.com>

a58674ff

22 8月, 2013 1 次提交

x86/asmlinkage: Fix warning in xen asmlinkage change · eb86b5fd

由 Andi Kleen 提交于 8月 21, 2013

Current code uses asmlinkage for functions without arguments.
This adds an implicit regparm(0) which creates a warning
when assigning the function to pointers.

Use __visible for the functions without arguments.
This avoids having to add regparm(0) to function pointers.
Since they have no arguments it does not make any difference.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1377115662-4865-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

eb86b5fd

20 8月, 2013 4 次提交

xen/smp: initialize IPI vectors before marking CPU online · fc78d343

由 Chuck Anderson 提交于 8月 06, 2013

An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:

	kernel BUG at drivers/xen/events.c:1328!

RCU has detected that a CPU has not entered a quiescent state within the
grace period.  It needs to send the CPU a reschedule IPI if it is not
offline.  rcu_implicit_offline_qs() does this check:

	/*
	 * If the CPU is offline, it is in a quiescent state.  We can
	 * trust its state not to change because interrupts are disabled.
	 */
	if (cpu_is_offline(rdp->cpu)) {
		rdp->offline_fqs++;
		return 1;
	}

	Else the CPU is online.  Send it a reschedule IPI.

The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()).  See start_secondary():

	set_cpu_online(smp_processor_id(), true);
	...
	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;

start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:

	/*
	 * Wait until the cpu which brought this one up marked it
	 * online before enabling interrupts. If we don't do that then
	 * we can end up waking up the softirq thread before this cpu
	 * reached the active state, which makes the scheduler unhappy
	 * and schedule the softirq thread on the wrong cpu. This is
	 * only observable with forced threaded interrupts, but in
	 * theory it could also happen w/o them. It's just way harder
	 * to achieve.
	 */
	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
		cpu_relax();

	/* enable local interrupts */
	local_irq_enable();

The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.

The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:

	kernel BUG at drivers/xen/events.c:1328!

	xen_send_IPI_one()
	xen_smp_send_reschedule()
	rcu_implicit_offline_qs()
	rcu_implicit_dynticks_qs()
	force_qs_rnp()
	force_quiescent_state()
	__rcu_process_callbacks()
	rcu_process_callbacks()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()

because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.

There is at least one other place that has caused the same crash:

	xen_smp_send_reschedule()
	wake_up_idle_cpu()
	add_timer_on()
	clocksource_watchdog()
	call_timer_fn()
	run_timer_softirq()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()
	xen_hvm_callback_vector()

clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:

	/*
	 * Cycle through CPUs to check if the CPUs stay synchronized
	 * to each other.
	 */
	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
	if (next_cpu >= nr_cpu_ids)
		next_cpu = cpumask_first(cpu_online_mask);
	watchdog_timer.expires += WATCHDOG_INTERVAL;
	add_timer_on(&watchdog_timer, next_cpu);

This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).

But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."

The conclusion was that:
"
1. At the IPI sender side:

   It is incorrect to send an IPI to an offline CPU (cpu not present in
   the cpu_online_mask). There are numerous places where we check this
   and warn/complain.

2. At the IPI receiver side:

   It is incorrect to let the world know of our presence (by setting
   ourselves in global bitmasks) until our initialization steps are complete
   to such an extent that we can handle the consequences (such as
   receiving interrupts without crashing the sender etc.)
" (from Srivatsa)

As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.

It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.

This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.

Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fc78d343

x86/xen: do not identity map UNUSABLE regions in the machine E820 · 3bc38cbc

由 David Vrabel 提交于 8月 16, 2013

If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.

There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.

We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.

This fixes a boot failure on hosts with tboot.

tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.

  (XEN)  0000000000000000 - 0000000000060000 (usable)
  (XEN)  0000000000060000 - 0000000000068000 (reserved)
  (XEN)  0000000000068000 - 000000000009e000 (usable)
  (XEN)  0000000000100000 - 0000000000800000 (usable)
  (XEN)  0000000000800000 - 0000000000972000 (unusable)

tboot marked this region as unusable.

  (XEN)  0000000000972000 - 00000000cf200000 (usable)
  (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
  (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
  (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
  (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
  (XEN)  00000000fe000000 - 0000000100000000 (reserved)
  (XEN)  0000000100000000 - 0000000630000000 (usable)
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3bc38cbc

x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM · 527bf129

由 Yinghai Lu 提交于 8月 12, 2013

Dave Hansen reported that systems between 500G and 600G RAM
crash early if DEBUG_PAGEALLOC is selected.

 > [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
 > [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
 > [    0.000000] BRK [0x02086000, 0x02086fff] PGTABLE
 > [    0.000000] BRK [0x02087000, 0x02087fff] PGTABLE
 > [    0.000000] BRK [0x02088000, 0x02088fff] PGTABLE
 > [    0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff]
 > [    0.000000]  [mem 0xe80ee00000-0xe80effffff] page 4k
 > [    0.000000] BRK [0x02089000, 0x02089fff] PGTABLE
 > [    0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE
 > [    0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory

It turns out that we missed increasing needed pages in BRK to
mapping initial 2M and [0,1M) when we switched to use the #PF
handler to set memory mappings:

 > commit 8170e6be
 > Author: H. Peter Anvin <hpa@zytor.com>
 > Date:   Thu Jan 24 12:19:52 2013 -0800
 >
 >     x86, 64bit: Use a #PF handler to materialize early mappings on demand

Before that, we had the maping from [0,512M) in head_64.S, and we
can spare two pages [0-1M).  After that change, we can not reuse
pages anymore.

When we have more than 512M ram, we need an extra page for pgd page
with [512G, 1024g).

Increase pages in BRK for page table to solve the boot crash.
Reported-by: NDave Hansen <dave.hansen@intel.com>
Bisected-by: NDave Hansen <dave.hansen@intel.com>
Tested-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> # v3.9 and later
Link: http://lkml.kernel.org/r/1376351004-4015-1-git-send-email-yinghai@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

527bf129

x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock · 17405453

由 Yoshihiro YUNOMAE 提交于 8月 20, 2013

Prevent crash_kexec() from deadlocking on ioapic_lock. When
crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
in disable_IO_APIC(). So if the cpu gets an NMI while locking
ioapic_lock, a deadlock will happen.

In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().

You can reproduce this deadlock the following way:

1. Add mdelay(1000) after raw_spin_lock_irqsave() in
   native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c

   Although the deadlock can occur without this modification, it will increase
   the potential of the deadlock problem.

2. Build and install the kernel

3. Set up the OS which will run panic() and kexec when NMI is injected
    # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
    # vim /etc/default/grub
      add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
    # grub2-mkconfig

4. Reboot the OS

5. Run following command for each vcpu on the guest
    # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done;
   By running this command, cpus will get ioapic_lock for setting affinity.

6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you
   use VM). After injecting NMI, panic() is called in an nmi-handler context.
   Then, kexec will normally run in panic(), but the operation will be stopped
   by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
   native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
   clear_IO_APIC_pin()->ioapic_read_entry().
Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevelSigned-off-by: NIngo Molnar <mingo@kernel.org>

17405453

16 8月, 2013 3 次提交

perf/x86/intel/uncore: Enable EV_SEL_EXT bit for PCU · 77b339bc

由 Yan, Zheng 提交于 8月 13, 2013

This patch adds support for the SNB-EP PCU uncore PMU extra_sel_bit
(bit 21) which is missing from the documentation in Table-2.75 of
Intel Xeon Processor E5-2600 Product Family Uncore Performance
Monitoring Guide. It is referred to later in Table-2.81. Without
this selection bit explicitly enabled by the kernel, some events
such as COREx_TRANSITION_CYCLES do not count correctly.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1376375382-21350-4-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

77b339bc

perf/x86/intel/uncore: Add filter support for QPI boxes · fd1ec259

由 Yan, Zheng 提交于 8月 07, 2013

The QPI uncore boxes have two pairs of MATCH/MASK registers that
user to filter packet traffic serviced by QPI link layer. These
registers are in auxiliary PCI devices.

This patch adds the auxiliary PCI devices to snbep_uncore_pci_ids
and adds field definitions for the MATCH/MASK registers.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375856245-10717-2-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

fd1ec259

perf/x86/intel/uncore: Add auxiliary pci device support · 899396cf

由 Yan, Zheng 提交于 8月 07, 2013

The QPI uncore boxes have two pairs of MATCH/MASK registers that
user to filter packet traffic serviced by QPI link layer. These
registers are in auxiliary PCI devices.

This patch changes the meaning of (struct pci_device_id)->driver_data.
The first 8 bits are device index of the same uncore type, the second
8 bytes are uncore type index. Auxiliary PCI device's type is defined
as UNCORE_EXTRA_PCI_DEV(0xff)
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375856245-10717-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

899396cf

15 8月, 2013 1 次提交

ACPI / x86: Print Hot-Pluggable Field in SRAT. · d7b2c3d8

由 Tang Chen 提交于 8月 14, 2013

The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d7b2c3d8

14 8月, 2013 3 次提交

x86 get_unmapped_area(): use proper mmap base for bottom-up direction · df54d6fa

由 Radu Caragea 提交于 8月 13, 2013

When the stack is set to unlimited, the bottomup direction is used for
mmap-ings but the mmap_base is not used and thus effectively renders
ASLR for mmapings along with PIE useless.

Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df54d6fa

mm: save soft-dirty bits on file pages · 41bb3476

由 Cyrill Gorcunov 提交于 8月 13, 2013

Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry.  Thus when #pf happens on such non-present pte
we can restore it back.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41bb3476

mm: save soft-dirty bits on swapped pages · 179ef71c

由 Cyrill Gorcunov 提交于 8月 13, 2013

Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.

To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out.  When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.

One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap.  The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

179ef71c

13 8月, 2013 3 次提交

sched: fix the theoretical signal_wake_up() vs schedule() race · e0acd0a6

由 Oleg Nesterov 提交于 8月 12, 2013

This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0acd0a6

x86, microcode, AMD: Fix early microcode loading · 84516098

由 Torsten Kaiser 提交于 8月 08, 2013

load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading does not depend on the CPU wrt the
patches loaded since they will end up in a global list for all CPUs
anyway.

The change from cpu to x86family in load_microcode_amd()
now allows to drop the code messing with cpu_data(cpu) from
collect_cpu_info_amd_early(), which is wrong anyway because at that
point the per-cpu cpu_info is not yet setup (These values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).

Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
because its only used at one place and without the cpuinfo_x86 accesses
it was not much left.
Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
[ Fengguang: build fix ]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
[ Boris: adapt it to current tree. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

84516098

x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 · 8c6b79bb

由 Torsten Kaiser 提交于 7月 23, 2013

cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early()
will fill ->x86 and so the fallback to boot_cpu_data is not used. But
->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
caller init_amd() will have a struct cpuinfo_x86 as parameter and the
set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
and the broken fallback can be dropped.

[ Boris: Drop WARN_ON() since we're called only from init_amd() ]
Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

8c6b79bb

12 8月, 2013 1 次提交

perf/x86: Add Haswell ULT model number used in Macbook Air and other systems · 0499bd86

由 Andi Kleen 提交于 8月 08, 2013

This one was missed earlier.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0499bd86

10 8月, 2013 1 次提交

x86: Don't clear olpc_ofw_header when sentinel is detected · d55e37bb

由 Daniel Drake 提交于 8月 09, 2013

OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.

This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.

OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.

OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.
Signed-off-by: NDaniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.orgAcked-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.9+

d55e37bb

07 8月, 2013 11 次提交

x86, asmlinkage, vdso: Mark vdso variables __visible · 28596b6a

由 Andi Kleen 提交于 8月 05, 2013

Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-17-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

28596b6a

x86, asmlinkage, power: Make various symbols used by the suspend asm code visible · d6efc2f7

由 Andi Kleen 提交于 8月 05, 2013

Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-16-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d6efc2f7

x86, asmlinkage: Make 64bit checksum functions visible · 4a335c06

由 Andi Kleen 提交于 8月 05, 2013

They are implemented in assembler.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-14-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4a335c06

x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops · 9a55fdbe

由 Andi Kleen 提交于 8月 05, 2013

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9a55fdbe

x86, asmlinkage, apm: Make APM data structure used from assembler visible · 54c2f3fd

由 Andi Kleen 提交于 8月 05, 2013

Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-12-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

54c2f3fd

x86, asmlinkage: Make syscall tables visible · e0e745e4

由 Andi Kleen 提交于 8月 05, 2013

They are referenced from entry*.S.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-11-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e0e745e4

x86, asmlinkage: Make several variables used from assembler/linker script visible · 277d5b40

由 Andi Kleen 提交于 8月 05, 2013

Plus one function, load_gs_index().
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

277d5b40

x86, asmlinkage: Make kprobes code visible and fix assembler code · 04bb591c

由 Andi Kleen 提交于 8月 05, 2013

- Make all the external assembler template symbols __visible
- Move the templates inline assembler code into a top level
  assembler statement, not inside a function. This avoids it being
  optimized away or cloned.

Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

04bb591c

x86, asmlinkage: Make various syscalls asmlinkage · ff49103f

由 Andi Kleen 提交于 8月 05, 2013

FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use
standard SYSCALL wrappers. But I didn't do that change in this
patch.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

ff49103f

x86, asmlinkage: Make 32bit/64bit __switch_to visible · 35ea7903

由 Andi Kleen 提交于 8月 05, 2013

This function is called from inline assembler, so has to be visible.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-6-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

35ea7903

x86, asmlinkage: Make _*_start_kernel visible · a1ed4ddf

由 Andi Kleen 提交于 8月 05, 2013

Obviously these functions have to be visible, otherwise
the whole kernel could be optimized away.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-5-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

a1ed4ddf