提交 · 70ea6855d368588a7f1b0242ab83ca6fe2e2ff16 · OpenHarmony / kernel_linux

06 12月, 2011 1 次提交

x86-64: Slightly shorten int_ret_from_sys_call · 70ea6855

由 Jan Beulich 提交于 11月 29, 2011

Testing for a return to ring 0 was necessary here solely because
of the branch out of ret_from_fork. That branch, however, can be
directed to retint_restore_args, and thus the test-and-branch
can be eliminated here.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4C7EE0200007800064028@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

70ea6855

05 12月, 2011 3 次提交

x86: Default to vsyscall=emulate · 2e57ae05

由 Andy Lutomirski 提交于 11月 07, 2011

This essentially reverts:

  2b666859: x86: Default to vsyscall=native for now

The ABI breakage should now be fixed by:

 commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9
 Author: Andy Lutomirski <luto@mit.edu>
 Date:   Thu Oct 20 08:48:19 2011 -0700

    x86-64: Set siginfo and context on vsyscall emulation faults
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@elte.hu>

2e57ae05

x86-64: Set siginfo and context on vsyscall emulation faults · 4fc34901

由 Andy Lutomirski 提交于 11月 07, 2011

To make this work, we teach the page fault handler how to send
signals on failed uaccess.  This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly.  UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.
Reported-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@elte.hu>

4fc34901

x86: Fix boot failures on older AMD CPU's · 8e8da023

由 Linus Torvalds 提交于 12月 04, 2011

People with old AMD chips are getting hung boots, because commit
bcb80e53 ("x86, microcode, AMD: Add microcode revision to
/proc/cpuinfo") moved the microcode detection too early into
"early_init_amd()".

At that point we are *so* early in the booth that the exception tables
haven't even been set up yet, so the whole

	rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);

doesn't actually work: if the rdmsr does a GP fault (due to non-existant
MSR register on older CPU's), we can't fix it up yet, and the boot fails.

Fix it by simply moving the code to a slightly later point in the boot
(init_amd() instead of early_init_amd()), since the kernel itself
doesn't even really care about the microcode patchlevel at this point
(or really ever: it's made available to user space in /proc/cpuinfo, and
updated if you do a microcode load).
Reported-tested-and-bisected-by: NLarry Finger <Larry.Finger@lwfinger.net>
Tested-by: NBob Tracy <rct@gherkin.frus.com>
Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e8da023

04 12月, 2011 1 次提交

xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf

由 Konrad Rzeszutek Wilk 提交于 11月 21, 2011

The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code.  It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle.  This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

  Brought up 2 CPUs
  invalid opcode: 0000 [#1] SMP

  Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
  RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
  ...
  Call Trace:
   [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
   [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
  RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
   RSP <ffff8801d28ddf10>

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall.  Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen.  This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5fd47bf

20 11月, 2011 1 次提交

KVM guest: prevent tracing recursion with kvmclock · 95ef1e52

由 Avi Kivity 提交于 11月 15, 2011

Prevent tracing of preempt_disable() in get_cpu_var() in
kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled,
preempt_disable/enable() are traced and this causes the function_graph
tracer to go into an infinite recursion. By open coding the
preempt_disable() around the get_cpu_var(), we can use the notrace
version which prevents preempt_disable/enable() from being traced and
prevents the recursion.

Based on a similar patch for Xen from Jeremy Fitzhardinge.
Tested-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

95ef1e52

14 11月, 2011 1 次提交

x86: Call stop_machine_text_poke() on all CPUs · 78345d2e

由 Rabin Vincent 提交于 10月 27, 2011

It appears that stop_machine_text_poke() wants to be called on all CPUs,
like it's done from text_poke_smp().  Fix text_poke_smp_batch() to do
this.
Signed-off-by: NRabin Vincent <rabin@rab.in>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jason Baron <jbaron@redhat.com>
Link: http://lkml.kernel.org/r/1319702072-32676-1-git-send-email-rabin@rab.inSigned-off-by: NIngo Molnar <mingo@elte.hu>

78345d2e

11 11月, 2011 1 次提交

x86, ioapic: Only print ioapic debug information for IRQs belonging to an ioapic chip · 6fd36ba0

由 Mathias Nyman 提交于 11月 10, 2011

with "apic=verbose" the print_IO_APIC() function tries to print
IRQ to pin mappings for every active irq. It assumes chip_data
is of type irq_cfg and may cause an oops if not.

As the print_IO_APIC() is called from a late_initcall other
chained irq chips may already be registered with custom
chip_data information, causing an oops. This is the case with
intel MID SoC devices with gpio demuxers registered as irq_chips.
Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
[ -v2: fixed build failure ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6fd36ba0

10 11月, 2011 4 次提交

x86/mrst: Avoid reporting wrong nmi status · 064a59b6

由 Jacob Pan 提交于 11月 10, 2011

Moorestown/Medfield platform does not have port 0x61 to report
NMI status, nor does it have external NMI sources. The only NMI
sources are from lapic, as results of perf counter overflow or
IPI, e.g. NMI watchdog or spin lock debug.

Reading port 0x61 on Moorestown will return 0xff which misled
NMI handlers to false critical errors such memory parity error.
The subsequent ioport access for NMI handling can also cause
undefined behavior on Moorestown.

This patch allows kernel process NMI due to watchdog or backrace
dump without unnecessary hangs.
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
[hand applied]
Signed-off-by: NAlan Cox <alan@linux.intel.com>

064a59b6

x86/apic: Allow use of lapic timer early calibration result · 1ade93ef

由 Jacob Pan 提交于 11月 10, 2011

lapic timer calibration can be combined with tsc in platform
specific calibration functions. if such calibration result is
obtained early, we can skip the redundant calibration loops.
Signed-off-by: NJacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NDirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1ade93ef

x86/apic: Do not clear nr_irqs_gsi if no legacy irqs · bb84ac2d

由 Jacob Pan 提交于 11月 10, 2011

nr_legacy_irqs is set in probe_nr_irqs_gsi, we should not clear
it after that. Otherwise, the result is that MSI irqs will be
allocated from the wrong range for the systems without legacy
PIC.
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: NDirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb84ac2d

x86/platform: Add a wallclock_init func to x86_platforms ops · cf8ff6b6

由 Feng Tang 提交于 11月 10, 2011

Some wall clock devices use MMIO based HW register, this new
function will give them a chance to do some initialization work
before their get/set_time service get called.
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NDirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cf8ff6b6

08 11月, 2011 1 次提交

x86/mce: Make mce_chrdev_ops 'static const' · 66f5ddf3

由 Luck, Tony 提交于 11月 03, 2011

Arjan would like to make struct file_operations const, but
mce-inject directly writes to the mce_chrdev_ops to install its
write handler. In an ideal world mce-inject would have its own
character device, but we have a sizable legacy of test scripts
that hardwire "/dev/mcelog", so it would be painful to switch to
a separate device now. Instead, this patch switches to a stub
function in the mce code, with a registration helper that
mce-inject can call when it is loaded.

Note that this would also allow for a sane process to allow
mce-inject to be unloaded again (with an unregister function,
and appropriate module_{get,put}() calls), but that is left for
potential future patches.
Reported-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

66f5ddf3

01 11月, 2011 5 次提交

i7core_edac: Drop the edac_mce facility · 4140c542

由 Borislav Petkov 提交于 7月 18, 2011

Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

4140c542

Cross Memory Attach · fcf63409

由 Christopher Yeoh 提交于 10月 31, 2011

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fcf63409

x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE · 69c60c88

由 Paul Gortmaker 提交于 5月 26, 2011

These files were implicitly getting EXPORT_SYMBOL via device.h
which was including module.h, but that will be fixed up shortly.

By fixing these now, we can avoid seeing things like:

arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’

[ with input from Randy Dunlap <rdunlap@xenotime.net> and also
  from Stephen Rothwell <sfr@canb.auug.org.au> ]
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

69c60c88

x86: fix implicit include of <linux/topology.h> in vsyscall_64 · 29574022

由 Paul Gortmaker 提交于 5月 26, 2011

In removing the presence of <linux/module.h> from some of the
more common <linux/something.h> files, this implict include
of <linux/topology.h> was uncovered.

  CC      arch/x86/kernel/vsyscall_64.o
  arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’:
  arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’

Explicitly call it out so the cleanup can take place.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

29574022

x86, MCE: Use notifier chain only for MCE decoding · f0cb5452

由 Borislav Petkov 提交于 7月 18, 2011

Drop the edac_mce custom hook in favor of the generic notifier
mechanism. Also, do not log the error to mcelog if the notified agent
was able to decode it.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f0cb5452

26 10月, 2011 2 次提交

x86/jump_label: add arch_jump_label_transform_static() · e71a5be1

由 Jeremy Fitzhardinge 提交于 9月 29, 2011

This allows jump-label entries to be cheaply updated on code which is
not yet live.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

e71a5be1

x86/jump_label: drop arch_jump_label_text_poke_early() · b7e31558

由 Jeremy Fitzhardinge 提交于 9月 29, 2011

It is no longer used.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

b7e31558

25 10月, 2011 1 次提交

x86: Fix compilation bug in kprobes' twobyte_is_boostable · 315eb8a2

由 Josh Stone 提交于 10月 24, 2011

When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I
noticed a warning about the asm operand for test_bit in kprobes'
can_boost.  I discovered that this caused only the first long of
twobyte_is_boostable[] to be output.

Jakub filed and fixed gcc PR50571 to correct the warning and this output
issue.  But to solve it for less current gcc, we can make kprobes'
twobyte_is_boostable[] non-const, and it won't be optimized out.

Before:

    CC      arch/x86/kernel/kprobes.o
  In file included from include/linux/bitops.h:22:0,
                   from include/linux/kernel.h:17,
                   from [...]/arch/x86/include/asm/percpu.h:44,
                   from [...]/arch/x86/include/asm/current.h:5,
                   from [...]/arch/x86/include/asm/processor.h:15,
                   from [...]/arch/x86/include/asm/atomic.h:6,
                   from include/linux/atomic.h:4,
                   from include/linux/mutex.h:18,
                   from include/linux/notifier.h:13,
                   from include/linux/kprobes.h:34,
                   from arch/x86/kernel/kprobes.c:43:
  [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
  [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input
        without lvalue in asm operand 1 is deprecated [enabled by default]

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                          554: R_386_32	.rodata.cst4

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
  Contents of section .rodata.cst4:
   0000 4c030000                             L...

Only a single long of twobyte_is_boostable[] is in the object file.

After, without the const on twobyte_is_boostable:

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                          554: R_386_32	.data

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
   0010 00000000 00000000 00000000 00000000  ................
   0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
   0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w

Now all 32 bytes are output into .data instead.
Signed-off-by: NJosh Stone <jistone@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

315eb8a2

19 10月, 2011 2 次提交

x86, microcode, AMD: Add microcode revision to /proc/cpuinfo · bcb80e53

由 Borislav Petkov 提交于 10月 17, 2011

Enable microcode revision output for AMD after 506ed6b5 ("x86,
intel: Output microcode revision in /proc/cpuinfo") did it for Intel.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

bcb80e53

x86, microcode: Correct microcode revision format · 881e23e5

由 Borislav Petkov 提交于 10月 17, 2011

506ed6b5 ("x86, intel: Output microcode revision in /proc/cpuinfo")
added microcode revision format to /proc/cpuinfo and the MCE handler in
decimal format but both AMD and Intel patch levels are handled as hex
numbers. Fix it.
Acked-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

881e23e5

18 10月, 2011 1 次提交

x86, perf, kprobes: Make kprobes's twobyte_is_boostable volatile · db45bd90

由 Josh Stone 提交于 10月 17, 2011

When compiling an i386_defconfig kernel with
gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand
for test_bit in kprobes' can_boost. I discovered that this
caused only the first long of twobyte_is_boostable[] to be
output.

Jakub filed and fixed gcc PR50571 to correct the warning and
this output issue.  But to solve it for less current gcc, we can
make kprobes' twobyte_is_boostable[] volatile, and it won't be
optimized out.

Before:

    CC      arch/x86/kernel/kprobes.o
  In file included from include/linux/bitops.h:22:0,
                   from include/linux/kernel.h:17,
                   from [...]/arch/x86/include/asm/percpu.h:44,
                   from [...]/arch/x86/include/asm/current.h:5,
                   from [...]/arch/x86/include/asm/processor.h:15,
                   from [...]/arch/x86/include/asm/atomic.h:6,
                   from include/linux/atomic.h:4,
                   from include/linux/mutex.h:18,
                   from include/linux/notifier.h:13,
                   from include/linux/kprobes.h:34,
                   from arch/x86/kernel/kprobes.c:43:
  [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
  [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default]

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                          554: R_386_32	.rodata.cst4

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
  Contents of section .rodata.cst4:
   0000 4c030000                             L...

Only a single long of twobyte_is_boostable[] is in the object
file.

After, with volatile:

  $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
       551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                          554: R_386_32	.data

  $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o

  arch/x86/kernel/kprobes.o:     file format elf32-i386

  Contents of section .data:
   0000 48000000 00000000 00000000 00000000  H...............
   0010 00000000 00000000 00000000 00000000  ................
   0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
   0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w

Now all 32 bytes are output into .data instead.
Signed-off-by: NJosh Stone <jistone@redhat.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Link: http://lkml.kernel.org/r/1318899645-4068-1-git-send-email-jistone@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

db45bd90

14 10月, 2011 2 次提交

x86, intel: Use c->microcode for Atom errata check · 30963c0a

由 Andi Kleen 提交于 10月 12, 2011

Now that the cpu update level is available the Atom PSE errata
check can use it directly without reading the MSR again.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1318466795-7393-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

30963c0a

x86, intel: Output microcode revision in /proc/cpuinfo · 506ed6b5

由 Andi Kleen 提交于 10月 12, 2011

I got a request to make it easier to determine the microcode
update level on Intel CPUs. This patch adds a new "microcode"
field to /proc/cpuinfo.

The microcode level is also outputed on fatal machine checks
together with the other CPUID model information.

I removed the respective code from the microcode update driver,
it just reads the field from cpu_data. Also when the microcode
is updated it fills in the new values too.

I had to add a memory barrier to native_cpuid to prevent it
being optimized away when the result is not used.

This turns out to clean up further code which already got this
information manually. This is done in followon patches.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1318466795-7393-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

506ed6b5

13 10月, 2011 2 次提交

x86, microcode: Don't request microcode from userspace unnecessarily · 70989449

由 Srivatsa S. Bhat 提交于 10月 09, 2011

Requesting the microcode from userspace *every time* when onlining CPUs
(during a CPU hotplug operation) is unnecessary. Thus, ensure that
once the kernel gets the microcode after booting, it is not freed nor
invalidated when a CPU goes offline, so that it can be reused when that
CPU comes back online, without requesting userspace for it again. As a
result, the CPU hotplug operations become faster as well.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/4E91F908.5010006@linux.vnet.ibm.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>

70989449

x86/irq: Standardize on CONFIG_SPARSE_IRQ=y · 141d55e6

由 Yinghai Lu 提交于 10月 12, 2011

Sparseirq got introduced in v2.6.28 and Thomas did a huge cleanup
around v2.6.38 that eliminated basically all disadvantages
of it.

So we can remove non-sparseirq support now and simplify
our IRQ degrees of freedom a bit.
Suggested-and-acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4E95E21D.6090200@oracle.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

141d55e6

12 10月, 2011 5 次提交

x86, ioapic: Clean up ioapic/apic_id usage · 6f50d45f

由 Yinghai Lu 提交于 10月 12, 2011

While looking at the code, apic_id sometime is referred to index
of ioapic, but sometime is used for phys apic id. and some even
use apic for real apic id. It is very confusing.

So try to limit apic_id or ioapic_id to be real apic id for
ioapic, and use ioapic_idx for ioapic index in the array.

-v2: Suggested by Ingo, use ioapic_idx consistently, instead of ioapic
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542DC.3090509@oracle.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

6f50d45f

x86, ioapic: Factor out print_IO_APIC() to only print one io apic · cda417dd

由 Yinghai Lu 提交于 10月 12, 2011

It is getting too big after the interrupt remaping entries debug
print out was added.

Original print_IO_APIC() becomes print_IO_APICs().
New print_IO_APIC() will only print one ioapic's registers

As a side-effect this clean-up also made checkpatch.pl happier.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542D3.5000008@oracle.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

cda417dd

x86, ioapic: Print out irte with right ioapic index · 3a61d7fe

由 Yinghai Lu 提交于 10月 12, 2011

While checking irte dump in dmesg, the print out is confusing
ioapic index with real io apic id:

IOAPIC[0]: Set routing entry (1-1 -> 0x31 -> IRQ 1 Mode:0
Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1
Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:31
Dest:00000001 SID:00FF SQ:0 SVT:1) IOAPIC[0]: Set routing entry
(1-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:30 Dest:00000001 SID:00FF SQ:0 SVT:1)

The system's first ioapic id is 1.

This commit:

| commit 3040db92
| Author: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
| Date:   Tue Jul 12 21:17:41 2011 +0000
|
|    x86, ioapic: Print IRTE when IR is enabled

Confused apic_id with the ioapic ID - fix it.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542C8.8040209@oracle.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

3a61d7fe

x86, ioapic: Split up setup_ioapic_entry() · c5b4712c

由 Yinghai Lu 提交于 10月 12, 2011

Ingo pointed out that setup_ioapic_entry() is way too big now.

Split the intr-remap code out into setup_ir_ioapic_entry().

Also pass struct io_apic_irq_attr * instead of 5 parameters
in those two functions.

At last in setup_ir_ioapic_entry() we don't need to panic.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542BB.4070807@oracle.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

c5b4712c

x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq() · e4aff811

由 Yinghai Lu 提交于 10月 12, 2011

Do not expand that struct, and just pass pointer to reduce the
number of parameters in related functions.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542B1.7050800@oracle.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

e4aff811

11 10月, 2011 1 次提交

x86: Default to vsyscall=native for now · 2b666859

由 Adrian Bunk 提交于 10月 06, 2011

This UML breakage:

linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790

Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add
vsyscall= parameter") - the vsyscall emulation code is not fully cooked
yet as UML relies on some rather fragile SIGSEGV semantics.

Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
to vsyscall=native for now, this patch implements that.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NAndrew Lutomirski <luto@mit.edu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fiSigned-off-by: NIngo Molnar <mingo@elte.hu>

2b666859

10 10月, 2011 6 次提交

x86, nmi, drivers: Fix nmi splitup build bug · d48b0e17

由 Ingo Molnar 提交于 10月 06, 2011

nmi.c needs an #include <linux/mca.h>:

arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’:
arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function)
arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in

Another one is the hpwdt driver:

drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d48b0e17

perf, x86: Implement IBS initialization · b7169166

由 Robert Richter 提交于 9月 21, 2011

This patch implements IBS feature detection and initialzation. The
code is shared between perf and oprofile. If IBS is available on the
system for perf, a pmu is setup.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316597423-25723-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b7169166

perf, x86: Share IBS macros between perf and oprofile · ee5789db

由 Robert Richter 提交于 9月 21, 2011

Moving IBS macros from oprofile to <asm/perf_event.h> to make it
available to perf. No additional changes.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

ee5789db

x86, nmi: Track NMI usage stats · efc3aac5

由 Don Zickus 提交于 9月 30, 2011

Now that the NMI handler are broken into lists, increment the appropriate
stats for each list. This allows us to see what is going on when they
get printed out in the next patch.
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-6-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

efc3aac5

x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233

由 Don Zickus 提交于 9月 30, 2011

Previous patches allow the NMI subsystem to process multipe NMI events
in one NMI. As previously discussed this can cause issues when an event
triggered another NMI but is processed in the current NMI. This causes the
next NMI to go unprocessed and become an 'unknown' NMI.

To handle this, we first have to flag whether or not the NMI handler handled
more than one event or not. If it did, then there exists a chance that
the next NMI might be already processed. Once the NMI is flagged as a
candidate to be swallowed, we next look for a back-to-back NMI condition.

This is determined by looking at the %rip from pt_regs. If it is the same
as the previous NMI, it is assumed the cpu did not have a chance to jump
back into a non-NMI context and execute code and instead handled another NMI.

If both of those conditions are true then we will swallow any unknown NMI.

There still exists a chance that we accidentally swallow a real unknown NMI,
but for now things seem better.

An optimization has also been added to the nmi notifier rountine. Because x86
can latch up to one NMI while currently processing an NMI, we don't have to
worry about executing _all_ the handlers in a standalone NMI. The idea is
if multiple NMIs come in, the second NMI will represent them. For those
back-to-back NMI cases, we have the potentail to drop NMIs. Therefore only
execute all the handlers in the second half of a detected back-to-back NMI.
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b227e233

x86, nmi: Wire up NMI handlers to new routines · 9c48f1c6

由 Don Zickus 提交于 9月 30, 2011

Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NCorey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9c48f1c6

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年