提交 · b77e70bf3535e0bd5472e0681f41cce4ae0598bb · openanolis / cloud-kernel

16 6月, 2011 1 次提交

x86, mce: Replace MCE_SELF_VECTOR by irq_work · b77e70bf

由 Hidetoshi Seto 提交于 6月 08, 2011

The MCE handler uses a special vector for self IPI to invoke
post-emergency processing in an interrupt context, e.g. call an
NMI-unsafe function, wakeup loggers, schedule time-consuming work for
recovery, etc.

This mechanism is now generalized by the following commit:

 > e360adbe
 > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
 > Date:   Thu Oct 14 14:01:34 2010 +0800
 >
 >  irq_work: Add generic hardirq context callbacks
 >
 >  Provide a mechanism that allows running code in IRQ context. It is
 >  most useful for NMI code that needs to interact with the rest of the
 >  system -- like wakeup a task to drain buffers.
 :

So change to use provided generic mechanism.
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: NTony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED6B2.6080005@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>

b77e70bf

18 3月, 2011 1 次提交

x86: Fix common misspellings · 0d2eb44f

由 Lucas De Marchi 提交于 3月 17, 2011

They were generated by 'codespell' and then manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d2eb44f

12 3月, 2011 1 次提交

x86, binutils, xen: Fix another wrong size directive · 371c394a

由 Alexander van Heukelum 提交于 3月 11, 2011

The latest binutils (2.21.0.20110302/Ubuntu) breaks the build
yet another time, under CONFIG_XEN=y due to a .size directive that
refers to a slightly differently named (hence, to the now very
strict and unforgiving assembler, non-existent) symbol.

[ mingo:

   This unnecessary build breakage caused by new binutils
   version 2.21 gets escallated back several kernel releases spanning
   several years of Linux history, affecting over 130,000 upstream
   kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially
   affecting all major Linux distro kernel configs).

   Git annotate tells us that this slight debug symbol code mismatch
   bug has been introduced in 2008 in commit 3d75e1b8:

     3d75e1b8        (Jeremy Fitzhardinge    2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)

   The 'bug' is just a slight assymetry in ENTRY()/END()
   debug-symbols sequences, with lots of assembly code between the
   ENTRY() and the END():

     ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)
       ...
     END(do_hypervisor_callback)

   Human reviewers almost never catch such small mismatches, and binutils
   never even warned about it either.

   This new binutils version thus breaks the Xen build on all upstream kernels
   since v2.6.27, out of the blue.

   This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels
   impossible on such binutils, for a bisection window of over hundred
   thousand historic commits. (!)

   This is a major fail on the side of binutils and binutils needs to turn
   this show-stopper build failure into a warning ASAP. ]
Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

371c394a

09 3月, 2011 1 次提交

x86: Separate out entry text section · ea714547

由 Jiri Olsa 提交于 3月 07, 2011

Put x86 entry code into a separate link section: .entry.text.

Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.

Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:

 before patch:
         19417627  L1-icache-load-misses      ( +-   0.147% )

 after patch:
         16490788  L1-icache-load-misses      ( +-   0.180% )

The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.

Whole perf output follows:

 - results for current tip tree:

  Performance counter stats for './hackbench/hackbench 10' (500 runs):

         19417627  L1-icache-load-misses      ( +-   0.147% )
       2676914223  instructions             #      0.497 IPC     ( +- 0.079% )
       5389516026  cycles                     ( +-   0.144% )

      0.206267711  seconds time elapsed   ( +-   0.138% )

 - results for current tip tree with the patch applied:

  Performance counter stats for './hackbench/hackbench 10' (500 runs):

         16490788  L1-icache-load-misses      ( +-   0.180% )
       2717734941  instructions             #      0.502 IPC     ( +- 0.079% )
       5414756975  cycles                     ( +-   0.148% )

      0.206747566  seconds time elapsed   ( +-   0.137% )
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ea714547

14 2月, 2011 1 次提交

x86: Allocate 32 tlb_invalidate_interrupt handler stubs · 3a09fb45

由 Shaohua Li 提交于 1月 17, 2011

Add up to 32 invalidate_interrupt handlers. How many handlers are
added depends on NUM_INVALIDATE_TLB_VECTORS. So if
NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
size.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3a09fb45

12 1月, 2011 1 次提交

KVM: Handle async PF in a guest. · 631bc487

由 Gleb Natapov 提交于 10月 14, 2010

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

631bc487

08 1月, 2011 1 次提交

x86: Save rbp in pt_regs on irq entry · 625dbc3b

由 Frederic Weisbecker 提交于 1月 06, 2011

From the x86_64 low level interrupt handlers, the frame pointer is
saved right after the partial pt_regs frame.

rbp is not supposed to be part of the irq partial saved registers,
but it only requires to extend the pt_regs frame by 8 bytes to
do so, plus a tiny stack offset fixup on irq exit.

This changes a bit the semantics or get_irq_entry() that is supposed
to provide only the value of caller saved registers and the cpu
saved frame. However it's a win for unwinders that can walk through
stack frames on top of get_irq_regs() snapshots.

A noticeable impact is that it makes perf events cpu-clock and
task-clock events based callchains working on x86_64.

Let's then save rbp into the irq pt_regs.

As a result with:

	perf record -e cpu-clock perf bench sched messaging
	perf report --stdio

Before:
    20.94%             perf  [kernel.kallsyms]        [k] lock_acquire
                       |
                       --- lock_acquire
                          |
                          |--44.01%-- __write_nocancel
                          |
                          |--43.18%-- __read
                          |
                          |--6.08%-- fork
                          |          create_worker
                          |
                          |--0.88%-- _dl_fixup
                          |
                          |--0.65%-- do_lookup_x
                          |
                          |--0.53%-- __GI___libc_read
                           --4.67%-- [...]

After:
    19.23%         perf  [kernel.kallsyms]    [k] __lock_acquire
                   |
                   --- __lock_acquire
                      |
                      |--97.74%-- lock_acquire
                      |          |
                      |          |--21.82%-- _raw_spin_lock
                      |          |          |
                      |          |          |--37.26%-- unix_stream_recvmsg
                      |          |          |          sock_aio_read
                      |          |          |          do_sync_read
                      |          |          |          vfs_read
                      |          |          |          sys_read
                      |          |          |          system_call
                      |          |          |          __read
                      |          |          |
                      |          |          |--24.09%-- unix_stream_sendmsg
                      |          |          |          sock_aio_write
                      |          |          |          do_sync_write
                      |          |          |          vfs_write
                      |          |          |          sys_write
                      |          |          |          system_call
                      |          |          |          __write_nocancel

v2: Fix cfi annotations.
Reported-by: NSoeren Sandmann Pedersen <sandmann@redhat.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jan Beulich <JBeulich@novell.com>

625dbc3b

18 11月, 2010 1 次提交

x86/kprobes: Prevent kprobes to probe on save_args() · de31ec8a

由 Masami Hiramatsu 提交于 11月 18, 2010

Prevent kprobes to probe on save_args() since this function
will be called from breakpoint exception handler. That will
cause infinit loop on breakpoint handling.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

de31ec8a

20 10月, 2010 1 次提交

x86, asm: Fix CFI macro invocations to deal with shortcomings in gas · 3234282f

由 Jan Beulich 提交于 10月 19, 2010

gas prior to (perhaps) 2.16.90 has problems with passing non-
parenthesized expressions containing spaces to macros. Spaces, however,
get inserted by cpp between any macro expanding to a number and a
subsequent + or -. For the +, current x86 gas then removes the space
again (future gas may not do so), but for the - the space gets retained
and is then considered a separator between macro arguments.

Fix the respective definitions for both the - and + cases, so that they
neither contain spaces nor make cpp insert any (the latter by adding
seemingly redundant parentheses).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

3234282f

19 10月, 2010 1 次提交

irq_work: Add generic hardirq context callbacks · e360adbe

由 Peter Zijlstra 提交于 10月 14, 2010

Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: NHuang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e360adbe

03 9月, 2010 4 次提交

x86: Use {push,pop}{l,q}_cfi in more places · df5d1874

由 Jan Beulich 提交于 9月 02, 2010

... plus additionally introduce {push,pop}f{l,q}_cfi. All in the
hope that the code becomes better readable this way (it gets
quite a bit smaller in any case).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NAlexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

df5d1874

x86-64: Use symbolics instead of raw numbers in entry_64.S · b1cccb1b

由 Jan Beulich 提交于 9月 02, 2010

... making the code a little less fragile.

Also use pushq_cfi instead of raw CFI annotations in two more
places, and add two missing annotations after stack pointer
adjustments which got modified here anyway.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NAlexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBACF0200007800013F6A@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b1cccb1b

x86-64: Adjust frame type at paranoid_exit: · 1f130a78

由 Jan Beulich 提交于 9月 02, 2010

As this isn't an exception or interrupt entry point, it doesn't
have any of the hardware provide frame layouts active.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NAlexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1f130a78

x86-64: Fix unwind annotations in syscall stubs · e6b04b6b

由 Jan Beulich 提交于 9月 02, 2010

With the return address removed from the stack, these should
really refer to their caller's register state.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NAlexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBA3D0200007800013F61@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e6b04b6b

14 8月, 2010 1 次提交

Mark arguments to certain syscalls as being const · c7887325

由 David Howells 提交于 8月 11, 2010

Mark arguments to certain system calls as being const where they should be but
aren't.  The list includes:

 (*) The filename arguments of various stat syscalls, execve(), various utimes
     syscalls and some mount syscalls.

 (*) The filename arguments of some syscall helpers relating to the above.

 (*) The buffer argument of various write syscalls.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7887325

02 8月, 2010 1 次提交

x86-64, asm: Directly access per-cpu IST · c15a5958

由 Brian Gerst 提交于 7月 31, 2010

Use a direct per-cpu reference for the IST instead of using a scratch
register.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
LKML-Reference: <1280594903-6341-1-git-send-email-brgerst@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c15a5958

23 7月, 2010 1 次提交

x86/xen: event channels delivery on HVM. · 38e20b07

由 Sheng Yang 提交于 5月 14, 2010

Set the callback to receive evtchns from Xen, using the
callback vector delivery mechanism.

The traditional way for receiving event channel notifications from Xen
is via the interrupts from the platform PCI device.
The callback vector is a newer alternative that allow us to receive
notifications on any vcpu and doesn't need any PCI support: we allocate
a vector exclusively to receive events, in the vector handler we don't
need to interact with the vlapic, therefore we avoid a VMEXIT.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

38e20b07

22 7月, 2010 1 次提交

x86: auditsyscall: fix fastpath return value after reschedule · 03275591

由 Roland McGrath 提交于 7月 21, 2010

In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls,
we can pass a bad return value and/or error indication for the
system call to audit_syscall_exit().  This happens when
TIF_NEED_RESCHED was set as the system call returned, so we went
out to schedule() and came back to the exit-audit fast-path.  The
fix is to reload the user return value register from the pt_regs
before using it for audit_syscall_exit().

Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit
system call fast paths work slightly differently, so that they
always leave the fast path entirely to reschedule and don't return
there, so they don't have the analogous bugs.
Reported-by: NAlexander Viro <aviro@redhat.com>
Signed-off-by: NRoland McGrath <roland@redhat.com>

03275591

11 12月, 2009 1 次提交

x86, 64-bit: Move kernel_thread to C · 3bd95dfb

由 Brian Gerst 提交于 12月 09, 2009

Prepare for merging with 32-bit.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

3bd95dfb

06 12月, 2009 1 次提交

x86: Fixup wrong debug exception frame link in stacktraces · b625b3b3

由 Frederic Weisbecker 提交于 12月 06, 2009

While dumping a stacktrace, the end of the exception stack won't link
the frame pointer to the previous stack.

The interrupted stack will then be considered as unreliable and ignored
by perf, as the frame pointer is unreliable itself.

This happens because we overwrite the frame pointer that links to the
interrupted frame with the address of the exception stack. This is
done in order to reserve space inside.
But rbp has been chosen here only because it is not a scratch register,
so that the address of the exception stack remains in rbp after calling
do_debug(), we can then release the exception stack space without the
need to retrieve its address again.

But we can pick another non-scratch register to do that, so that we
preserve the link to the interrupted stack frame in the stacktraces.

Just randomly choose r12. Every registers are saved just before and
restored just after calling do_debug(). And r12 is not used in the
middle, which makes it a perfect candidate.

Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw

Before:
    44.18%  [k] _raw_read_lock
            |
            |
            ---  |--6.31%-- waitid
                 |
                 |--4.26%-- writev
                 |
                 |--3.63%-- __select
                 |
                 |--3.15%-- __waitpid
                 |          |
                 |          |--28.57%-- 0x8b52e00000139f
                 |          |
                 |          |--28.57%-- 0x8b52e0000013c6
                 |          |
                 |          |--14.29%-- 0x7fde786dc000
                 |          |
                 |          |--14.29%-- 0x62696c2f7273752f
                 |          |
                 |           --14.29%-- 0x1ea9df800000000
                 |
                 |--3.00%-- __poll

After:

    43.94%  [k] _raw_read_lock
            |
            --- _read_lock
               |
               |--60.53%-- send_sigio
               |          __kill_fasync
               |          kill_fasync
               |          evdev_pass_event
               |          evdev_event
               |          input_pass_event
               |          input_handle_event
               |          input_event
               |          synaptics_process_byte
               |          psmouse_handle_byte
               |          psmouse_interrupt
               |          serio_interrupt
               |          i8042_interrupt
               |          handle_IRQ_event
               |          handle_edge_irq
               |          handle_irq
               |          __irqentry_text_start
               |          ret_from_intr
               |          |
               |          |--30.43%-- __select
               |          |
               |          |--17.39%-- 0x454f15
               |          |
               |          |--13.04%-- __read
               |          |
               |          |--13.04%-- vread_hpet
               |          |
               |          |--13.04%-- _xcb_lock_io
               |          |
               |           --13.04%-- 0x7f630878ce87

Note: it does not only affect perf events but also other stacktraces in
x86-64. They were considered as unreliable once we quit the debug
stack frame.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>

b625b3b3

04 11月, 2009 1 次提交

x86, 64-bit: Fix bstep_iret jump · 97829de5

由 Brian Gerst 提交于 11月 03, 2009

This jump should be unconditional.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
LKML-Reference: <1257274925-15713-1-git-send-email-brgerst@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

97829de5

15 10月, 2009 1 次提交

x86: UV RTC: Rename generic_interrupt to x86_platform_ipi · 4a4de9c7

由 Dimitri Sivanich 提交于 10月 14, 2009

Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091014142257.GE11048@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4a4de9c7

14 10月, 2009 1 次提交

function-graph/x86: Replace unbalanced ret with jmp · 194ec341

由 Steven Rostedt 提交于 10月 13, 2009

The function graph tracer replaces the return address with a hook
to trace the exit of the function call. This hook will finish by
returning to the real location the function should return to.

But the current implementation uses a ret to jump to the real
return location. This causes a imbalance between calls and ret.
That is the original function does a call, the ret goes to the
handler and then the handler does a ret without a matching call.

Although the function graph tracer itself still breaks the branch
predictor by replacing the original ret, by using a second ret and
causing an imbalance, it breaks the predictor even more.

This patch replaces the ret with a jmp to keep the calls and ret
balanced. I tested this on one box and it showed a 1.7% increase in
performance. Another box only showed a small 0.3% increase. But no
box that I tested this on showed a decrease in performance by
making this change.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091013203425.042034383@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

194ec341

13 10月, 2009 1 次提交

x86, 64-bit: Move K8 B step iret fixup to fault entry asm · ae24ffe5

由 Brian Gerst 提交于 10月 12, 2009

Move the handling of truncated %rip from an iret fault to the fault
entry path.

This allows x86-64 to use the standard search_extable() function.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ae24ffe5

23 9月, 2009 1 次提交

x86: ptrace: sysret path should reach syscall_trace_leave · b60e714d

由 Roland McGrath 提交于 9月 22, 2009

If TIF_SYSCALL_TRACE or TIF_SINGLESTEP is set while inside a syscall,
the path back to user mode should get to syscall_trace_leave.

This does happen in most circumstances. The exception to this is on
the 64-bit syscall fastpath, when no such flag was set on syscall
entry and nothing else has punted it off the fastpath for exit. That
one exit fastpath fails to check for _TIF_WORK_SYSCALL_EXIT flags.
This makes the behavior inconsistent with what 32-bit tasks see and
what the native 32-bit kernel always does, and what 64-bit tasks see
in all cases where the iret path is taken anyhow.

Perhaps the only example that is affected is a ptrace stop inside
do_fork (for PTRACE_O_TRACE{CLONE,FORK,VFORK,VFORKDONE}). Other
syscalls with internal ptrace stop points (execve) already take the
iret exit path for unrelated reasons.

Test cases for both PTRACE_SYSCALL and PTRACE_SINGLESTEP variants are at:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/syscall-from-clone.c?cvsroot=systemtap
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/step-from-clone.c?cvsroot=systemtap

There was no special benefit to the sysret path's special path to call
do_notify_resume, because it always takes the iret exit path at the end.
So this change just makes the sysret exit path join the iret exit path
for all the signals and ptrace cases. The fastpath still applies to
the plain syscall-audit and resched cases.
Signed-off-by: NRoland McGrath <roland@redhat.com>
CC: Oleg Nesterov <oleg@redhat.com>

b60e714d

21 9月, 2009 1 次提交

perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482

由 Ingo Molnar 提交于 9月 21, 2009

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )
Suggested-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdd6c482

13 9月, 2009 1 次提交

tracing/function-graph: x86_64 stack allocation cleanup · 4818d809

由 Jiri Olsa 提交于 7月 29, 2009

Only 24 bytes needs to be reserved on the stack for the function graph
tracer on x86_64.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <20090729085837.GB4998@jolsa.lab.eng.brq.redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4818d809

30 8月, 2009 1 次提交

kprobes/x86-64: Fix to move common_interrupt to .kprobes.text · 8222d718

由 Masami Hiramatsu 提交于 8月 27, 2009

Since nmi, debug and int3 returns to irq_return inside common_interrupt,
probing this function will cause int3-loop, so it should be marked
as __kprobes.
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20090827172325.8246.40000.stgit@localhost.localdomain>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

8222d718

19 6月, 2009 1 次提交

function-graph: add stack frame test · 71e308a2

由 Steven Rostedt 提交于 6月 18, 2009

In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.

An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.

This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.

There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.

This test detected the problem and pointed out where the issue was.

This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.

Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

71e308a2

04 6月, 2009 2 次提交

x86: fix panic with interrupts off (needed for MCE) · 4ef702c1

由 Andi Kleen 提交于 5月 27, 2009

For some time each panic() called with interrupts disabled
triggered the !irqs_disabled() WARN_ON in smp_call_function(),
producing ugly backtraces and confusing users.

This is a common situation with machine checks for example which
tend to call panic with interrupts disabled, but will also hit
in other situations e.g. panic during early boot.  In fact it
means that panic cannot be called in many circumstances, which
would be bad.

This all started with the new fancy queued smp_call_function,
which is then used by the shutdown path to shut down the other
CPUs.

On closer examination it turned out that the fancy RCU
smp_call_function() does lots of things not suitable in a panic
situation anyways, like allocating memory and relying on complex
system state.

I originally tried to patch this over by checking for panic
there, but it was quite complicated and the original patch
was also not very popular.  This also didn't fix some of the
underlying complexity problems.

The new code in post 2.6.29 tries to patch around this by
checking for oops_in_progress, but that is not enough to make
this fully safe and I don't think that's a real solution
because panic has to be reliable.

So instead use an own vector to reboot.  This makes the reboot
code extremly straight forward, which is definitely a big plus
in a panic situation where it is important to avoid relying on
too much kernel state.  The new simple code is also safe to be
called from interupts off region because it is very very simple.

There can be situations where it is important that panic
is reliable.  For example on a fatal machine check the panic
is needed to get the system up again and running as quickly
as possible.  So it's important that panic is reliable and
all function it calls simple.

This is why I came up with this simple vector scheme.
It's very hard to beat in simplicity.  Vectors are not
particularly precious anymore since all big systems are
using per CPU vectors.

Another possibility would have been to use an NMI similar
to kdump, but there is still the problem that NMIs don't
work reliably on some systems due to BIOS issues.  NMIs
would have been able to stop CPUs running with interrupts
off too.  In the sake of universal reliability I opted for
using a non NMI vector for now.

I put the reboot vector into the highest priority bucket of
the APIC vectors and moved the 64bit UV_BAU message down
instead into the next lower priority.

[ Impact: bug fix, fixes an old regression ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

4ef702c1

x86, mce: implement bootstrapping for machine check wakeups · ccc3c319

由 Andi Kleen 提交于 5月 27, 2009

Machine checks support waking up the mcelog daemon quickly.

The original wake up code for this was pretty ugly, relying on
a idle notifier and a special process flag. The reason it did
it this way is that the machine check handler is not subject
to normal interrupt locking rules so it's not safe
to call wake_up().  Instead it set a process flag
and then either did the wakeup in the syscall return
or in the idle notifier.

This patch adds a new "bootstraping" method as replacement.

The idea is that the handler checks if it's in a state where
it is unsafe to call wake_up(). If it's safe it calls it directly.
When it's not safe -- that is it interrupted in a critical
section with interrupts disables -- it uses a new "self IPI" to trigger
an IPI to its own CPU. This can be done safely because IPI
triggers are atomic with some care. The IPI is raised
once the interrupts are reenabled and can then safely call
wake_up().

When APICs are disabled the event is just queued and will be picked up
eventually by the next polling timer. I think that's a reasonable
compromise, since it should only happen quite rarely.

Contains fixes from Ying Huang.

[ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ccc3c319

03 6月, 2009 1 次提交

perf_counter/x86: Remove the IRQ (non-NMI) handling bits · a3288106

由 Yong Wang 提交于 6月 03, 2009

Remove the IRQ (non-NMI) handling bits as NMI will be used always.
Signed-off-by: NYong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a3288106

29 5月, 2009 2 次提交

x86, mce: enable MCE_INTEL for 32bit new MCE · 7856f6cc

由 Andi Kleen 提交于 4月 28, 2009

Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

7856f6cc

x86, mce: use a call vector to call the 64bit mce handler · 5d727926

由 Andi Kleen 提交于 4月 27, 2009

Allows to call different machine check handlers from the low
level machine check entry vector.

This is needed for later when it will be used for 32bit too.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5d727926

09 5月, 2009 1 次提交

xen/x86-64: fix breakpoints and hardware watchpoints · 6cac5a92

由 Jeremy Fitzhardinge 提交于 3月 29, 2009

Native x86-64 uses the IST mechanism to run int3 and debug traps on
an alternative stack.  Xen does not do this, and so the frames were
being misinterpreted by the ptrace code.  This change special-cases
these two exceptions by using Xen variants which run on the normal
kernel stack properly.

Impact: avoid crash or bad data when IST trap is invoked under Xen
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

6cac5a92

18 4月, 2009 1 次提交

lockdep, x86: account for irqs enabled in paranoid_exit · 0300e7f1

由 Steven Rostedt 提交于 4月 17, 2009

I hit the check_flags error of lockdep:

 WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
 [...]
 hardirqs last  enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
 hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
 softirqs last  enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
 softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0

The check_flags warning of lockdep tells me that lockdep thought interrupts
were disabled, but they were really enabled.

The numbers in the above parenthesis show the order of events:

 12566: softirqs last enabled:  lock_sock_nested
 12567: hardirqs last enabled:  local_bh_enable
 12568: softirqs last disabled: tcp_prequeue_process
 12566: hardirqs last disabled: int3

int3 is a breakpoint!

Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
break points into the kernel.

The paranoid_exit of the return of int3 does not account for enabling
interrupts on return to kernel. This code is a bit tricky since it
is also used by the nmi handler (when lockdep is off), and we must be
careful about the swapgs. We can not call kernel code after the swapgs
has been performed.

[ Impact: fix lockdep check_flags warning + self-turn-off ]
Acked-by: NPeter Zijlsta <a.p.zijlstra@chello.nl>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0300e7f1

10 4月, 2009 1 次提交

x86, function-graph: only save return values on x86_64 · e71e99c2

由 Steven Rostedt 提交于 3月 25, 2009

Impact: speed up

The return to handler portion of the function graph tracer should only
need to save the return values. The caller already saved off the
registers that the callee can modify. The returning function already
saved the registers it modified. When we call our own trace function
it too will save the registers that the callee must restore.

There's no reason to save off anything more that the registers used
to return the values.

Note, I did a complete kernel build with this modification and the
function graph tracer running on x86_64.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e71e99c2

07 4月, 2009 1 次提交

perf_counter: x86: self-IPI for pending work · b6276f35

由 Peter Zijlstra 提交于 4月 06, 2009

Implement set_perf_counter_pending() with a self-IPI so that it will
run ASAP in a usable context.

For now use a second IRQ vector, because the primary vector pokes
the apic in funny ways that seem to confuse things.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090406094517.724626696@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b6276f35

12 3月, 2009 2 次提交

x86-64: move save_paranoid into .kprobes.text · c2810188

由 Jan Beulich 提交于 3月 12, 2009

Impact: mark save_paranoid as non-kprobe-able code

This appears to be necessary as the function gets called from
kprobes-unsafe exception handling stubs (i.e. which themselves
live in .kprobes.text).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <49B8F44F.76E4.0078.0@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c2810188

x86: remove leftover unwind annotations · 9fa7266c

由 Jan Beulich 提交于 3月 12, 2009

Impact: cleanup

These got left in needlessly when ret_from_fork got simplified.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <49B8F355.76E4.0078.0@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9fa7266c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功