提交 · 2071b3e34fd33e496ebd7b90331ac5b3b0ac3b81 · openanolis / cloud-kernel

05 5月, 2014 1 次提交

x86, espfix: Make espfix64 a Kconfig option, fix UML · 197725de

由 H. Peter Anvin 提交于 5月 04, 2014

Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.

This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com

197725de

01 5月, 2014 1 次提交

x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 3891a04a

由 H. Peter Anvin 提交于 4月 29, 2014

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.
Reported-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge

3891a04a

19 3月, 2014 1 次提交

x86, vdso: Make vsyscall_gtod_data handling x86 generic · d2312e33

由 Stefani Seibold 提交于 3月 17, 2014

This patch move the vsyscall_gtod_data handling out of vsyscall_64.c
into an additonal file vsyscall_gtod.c to make the functionality
available for x86 32 bit kernel.

It also adds a new vsyscall_32.c which setup the VVAR page.
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d2312e33

16 1月, 2014 1 次提交

x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs · 7da7c156

由 Bin Gao 提交于 10月 21, 2013

On SoCs that have the calibration MSRs available, either there is no
PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is
driven from the same clock as the TSC, so calibration is redundant and
just slows down the boot.

TSC rate is caculated by this formula:
<maximum core-clock to bus-clock ratio> * <maximum resolved frequency>
The ratio and the resolved frequency ID can be obtained from MSR.
See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5
for details.
Signed-off-by: NBin Gao <bin.gao@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org

7da7c156

14 1月, 2014 1 次提交

x86, microcode: Move to a proper location · bad5fa63

由 Borislav Petkov 提交于 12月 01, 2013

We've grown a bunch of microcode loader files all prefixed with
"microcode_". They should be under cpu/ because this is strictly
CPU-related functionality so do that and drop the prefix since they're
in their own directory now which gives that prefix. :)

While at it, drop MICROCODE_INTEL_LIB config item and stash the
functionality under CONFIG_MICROCODE_INTEL as it was its only user.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

bad5fa63

09 1月, 2014 1 次提交

arch: x86: New MailBox support driver for Intel SOC's · 46184415

由 David E. Box 提交于 1月 08, 2014

Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
configuration registers on devices (called units) connected to the system
fabric. This is a support driver that implements access to this interface on
those platforms that can enumerate the device using PCI. Initial support is for
BayTrail, for which port definitons are provided. This is a requirement for
implementing platform specific features (e.g. RAPL driver requires this to
perform platform specific power management using the registers in PUNIT).
Dependant modules should select IOSF_MBI in their respective Kconfig
configuraiton. Serialized access is handled by all exported routines with
spinlocks.

The API includes 3 functions for access to unit registers:

int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)

port:	indicating the unit being accessed
opcode:	the read or write port specific opcode
offset:	the register offset within the port
mdr:	the register data to be read, written, or modified
mask:	bit locations in mdr to change

Returns nonzero on error

Note: GPU code handles access to the GFX unit. Therefore access to that unit
with this driver is disallowed to avoid conflicts.
Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>

46184415

29 12月, 2013 1 次提交

x86: Export x86 boot_params to sysfs · 5039e316

由 Dave Young 提交于 12月 20, 2013

kexec-tools use boot_params for getting the 1st kernel hardware_subarch,
the kexec kernel EFI runtime support also needs to read the old efi_info
from boot_params. Currently it exists in debugfs which is not a good
place for such infomation. Per HPA, we should avoid "sploit debugfs".

In this patch /sys/kernel/boot_params are exported, also the setup_data is
exported as a subdirectory. kexec-tools is using debugfs for hardware_subarch
for a long time now so we're not removing it yet.

Structure is like below:

/sys/kernel/boot_params
|__ data                /* boot_params in binary*/
|__ setup_data
|   |__ 0               /* the first setup_data node */
|   |   |__ data        /* setup_data node 0 in binary*/
|   |   |__ type        /* setup_data type of setup_data node 0, hex string */
[snip]
|__ version             /* boot protocal version (in hex, "0x" prefixed)*/
Signed-off-by: NDave Young <dyoung@redhat.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Tested-by: NToshi Kani <toshi.kani@hp.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

5039e316

25 9月, 2013 1 次提交

sched, x86: Optimize the preempt_schedule() call · 1a338ac3

由 Peter Zijlstra 提交于 8月 14, 2013

Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
[ Fixed build error. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

1a338ac3

03 8月, 2013 2 次提交

x86: sysfb: move EFI quirks from efifb to sysfb · 2995e506

由 David Herrmann 提交于 8月 02, 2013

The EFI FB quirks from efifb.c are useful for simple-framebuffer devices
as well. Apply them by default so we can convert efifb.c to use
efi-framebuffer platform devices.
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-5-git-send-email-dh.herrmann@gmail.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2995e506

x86: provide platform-devices for boot-framebuffers · e3263ab3

由 David Herrmann 提交于 8月 02, 2013

The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on
x86 causes troubles when loading multiple fbdev drivers. The global
"struct screen_info" does not provide any state-tracking about which
drivers use the FBs. request_mem_region() theoretically works, but
unfortunately vesafb/efifb ignore it due to quirks for broken boards.

Avoid this by creating a platform framebuffer devices with a pointer
to the "struct screen_info" as platform-data. Drivers can now create
platform-drivers and the driver-core will refuse multiple drivers being
active simultaneously.

We keep the screen_info available for backwards-compatibility. Drivers
can be converted in follow-up patches.

Different devices are created for VGA/VESA/EFI FBs to allow multiple
drivers to be loaded on distro kernels. We create:
- "vesa-framebuffer" for VBE/VESA graphics FBs
- "efi-framebuffer" for EFI FBs
- "platform-framebuffer" for everything else
This allows to load vesafb, efifb and others simultaneously and each
picks up only the supported FB types.

Apart from platform-framebuffer devices, this also introduces a
compatibility option for "simple-framebuffer" drivers which recently got
introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we
try to match the screen_info against a simple-framebuffer supported
format. If we succeed, we create a "simple-framebuffer" device instead
of a platform-framebuffer.
This allows to reuse the simplefb.c driver across architectures and also
to introduce a SimpleDRM driver. There is no need to have vesafb.c,
efifb.c, simplefb.c and more just to have architecture specific quirks
in their setup-routines.

Instead, we now move the architecture specific quirks into x86-setup and
provide a generic simple-framebuffer. For backwards-compatibility (if
strange formats are used), we still allow vesafb/efifb to be loaded
simultaneously and pick up all remaining devices.
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.comTested-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e3263ab3

21 6月, 2013 2 次提交

trace,x86: Move creation of irq tracepoints from apic.c to irq.c · 83ab8514

由 Steven Rostedt (Red Hat) 提交于 6月 21, 2013

Compiling without CONFIG_X86_LOCAL_APIC set, apic.c will not be
compiled, and the irq tracepoints will not be created via the
CREATE_TRACE_POINTS macro. When CONFIG_X86_LOCAL_APIC is not set,
we get the following build error:

LD init/built-in.o
arch/x86/built-in.o: In function `trace_x86_platform_ipi_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o: In function `trace_x86_platform_ipi_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o: In function `trace_irq_work_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o: In function `trace_irq_work_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_exit'
arch/x86/built-in.o:(__jump_table+0x8): undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o:(__jump_table+0x14): undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o:(__jump_table+0x20): undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o:(__jump_table+0x2c): undefined reference to `__tracepoint_irq_work_exit'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

As irq.c is always compiled for x86, it is a more appropriate location
to create the irq tracepoints.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

83ab8514

x86, trace: Add irq vector tracepoints · cf910e83

由 Seiji Aguchi 提交于 6月 20, 2013

[Purpose of this patch]

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.

I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.

[Patch Description]

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector
   - irq_work_entry_vector
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
 - Create trace irq handlers with entering_irq()/exiting_irq().
 - Create a new IDT, trace_idt_table, at boot time by adding a logic to
   _set_gate(). It is just a copy of original idt table.
 - Register the new handlers for tracpoints to the new IDT by introducing
   macros to alloc_intr_gate() called at registering time of irq_vector handlers.
 - Add checking, whether irq vector tracing is on/off, into load_current_idt().
   This has to be done below debug checking for these reasons.
   - Switching to debug IDT may be kicked while tracing is enabled.
   - On the other hands, switching to trace IDT is kicked only when debugging
     is disabled.

In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.
Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>

cf910e83

31 5月, 2013 1 次提交

x86, microcode, amd: Early microcode patch loading support for AMD · 757885e9

由 Jacob Shin 提交于 5月 30, 2013

Add early microcode patch loading support for AMD.
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>

757885e9

14 5月, 2013 1 次提交

x86: Extend #DF debugging aid to 64-bit · 4d067d8e

由 Borislav Petkov 提交于 5月 09, 2013

It is sometimes very helpful to be able to pinpoint the location which
causes a double fault before it turns into a triple fault and the
machine reboots. We have this for 32-bit already so extend it to 64-bit.
On 64-bit we get the register snapshot at #DF time and not from the
first exception which actually causes the #DF. It should be close
enough, though.

[ hpa: and definitely better than nothing, which is what we have now. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4d067d8e

01 2月, 2013 1 次提交

x86/microcode_intel.h: Define functions and macros for early loading ucode · 9cd4d78e

由 Fenghua Yu 提交于 12月 20, 2012

Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9cd4d78e

22 1月, 2013 2 次提交

kprobes/x86: Move kprobes stuff under arch/x86/kernel/kprobes/ · f684199f

由 Masami Hiramatsu 提交于 9月 28, 2012

Move arch-dep kprobes stuff under arch/x86/kernel/kprobes.

Link: http://lkml.kernel.org/r/20120928081522.3560.75469.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed whitespace and s/__attribute__((packed))/__packed/ ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f684199f

kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c · e7dbfe34

由 Masami Hiramatsu 提交于 9月 28, 2012

Split ftrace-based kprobes code from kprobes, and introduce
CONFIG_(HAVE_)KPROBES_ON_FTRACE Kconfig flags.
For the cleanup reason, this also moves kprobe_ftrace check
into skip_singlestep.

Link: http://lkml.kernel.org/r/20120928081520.3560.25624.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e7dbfe34

14 11月, 2012 1 次提交

tracing,x86: Add a TSC trace_clock · 8cbd9cc6

由 David Sharp 提交于 11月 13, 2012

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.comAcked-by: NIngo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: NDavid Sharp <dhsharp@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8cbd9cc6

24 10月, 2012 1 次提交

x86: Allow tracing of functions in arch/x86/kernel/rtc.c · ce37f400

由 David Vrabel 提交于 10月 08, 2012

Move native_read_tsc() to tsc.c to allow profiling to be
re-enabled for rtc.c.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1349698050-6560-1-git-send-email-david.vrabel@citrix.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ce37f400

01 10月, 2012 1 次提交

x86, um/x86: switch to generic sys_execve and kernel_execve · 6783eaa2

由 Al Viro 提交于 8月 02, 2012

32bit wrapper is lost on that; 64bit one is *not*, since
we need to arrange for full pt_regs on stack when we call
sys_execve() and we need to load callee-saved ones from
there afterwards.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6783eaa2

23 8月, 2012 1 次提交

x86: KVM guest: merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST · 90993cdd

由 Marcelo Tosatti 提交于 8月 16, 2012

The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is
not so clear anymore, as demonstrated by recent bugs caused by poor
handling of on/off combinations of these options.

Merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST.
Reported-By: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

90993cdd

10 8月, 2012 1 次提交

perf: Unified API to record selective sets of arch registers · c5e63197

由 Jiri Olsa 提交于 8月 07, 2012

This brings a new API to help the selective dump of registers on event
sampling, and its implementation for x86 arch.

Added HAVE_PERF_REGS config option to determine if the architecture
provides perf registers ABI.

The information about desired registers will be passed in u64 mask.
It's up to the architecture to map the registers into the mask bits.

For the x86 arch implementation, both 32 and 64 bit registers bits are
defined within single enum to ensure 64 bit system can provide register
dump for compat task if needed in the future.
Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
[ Added missing linux/errno.h include ]
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-2-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

c5e63197

18 5月, 2012 1 次提交

MCA: delete all remaining traces of microchannel bus support. · bb8187d3

由 Paul Gortmaker 提交于 5月 17, 2012

Hardware with MCA bus is limited to 386 and 486 class machines
that are now 20+ years old and typically with less than 32MB
of memory.  A quick search on the internet, and you see that
even the MCA hobbyist/enthusiast community has lost interest
in the early 2000 era and never really even moved ahead from
the 2.4 kernels to the 2.6 series.

This deletes anything remaining related to CONFIG_MCA from core
kernel code and from the x86 architecture.  There is no point in
carrying this any further into the future.

One complication to watch for is inadvertently scooping up
stuff relating to machine check, since there is overlap in
the TLA name space (e.g. arch/x86/boot/mca.c).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: x86@kernel.org
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

bb8187d3

17 5月, 2012 1 次提交

x86, realmode: Move kernel/realmode.c to realmode/init.c · 13712701

由 H. Peter Anvin 提交于 5月 16, 2012

Keep all the realmode code together, including initialization (only
the rm/ subdirectory is actually built as real-mode code, anyway.)
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

13712701

09 5月, 2012 3 次提交

x86, realmode: Move ACPI wakeup to unified realmode code · c9b77ccb

由 Jarkko Sakkinen 提交于 5月 08, 2012

Migrated ACPI wakeup code to the real-mode blob.
Code existing in .x86_trampoline  can be completely
removed. Static descriptor table in wakeup_asm.S is
courtesy of H. Peter Anvin.
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-7-git-send-email-jarkko.sakkinen@intel.com
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c9b77ccb

x86, realmode: Move reboot_32.S to unified realmode code · 5a8c9aeb

由 Jarkko Sakkinen 提交于 5月 08, 2012

Migrated reboot_32.S from x86_trampoline to the real-mode
blob.
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-5-git-send-email-jarkko.sakkinen@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

5a8c9aeb

x86, realmode: Relocator for realmode code · 084ee1c6

由 Jarkko Sakkinen 提交于 5月 08, 2012

Implements relocator for real mode code that is called
as part of setup_arch(). Processes segment relocations
and linear relocations. Real-mode code is relocated to
a free hole below 1 MB.
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-4-git-send-email-jarkko.sakkinen@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

084ee1c6

05 5月, 2012 1 次提交

x86: Use generic init_task · 45046892

由 Thomas Gleixner 提交于 5月 03, 2012

Same code. Use the generic version. The special Makefile treatment is
pointless anyway as init_task.o contains only data which is handled by
the linker script. So no point on being treated like head text.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120503085035.739963562@linutronix.de
Cc: x86@kernel.org

45046892

06 3月, 2012 1 次提交

x86/kprobes: Split out optprobe related code to kprobes-opt.c · 3f33ab1c

由 Masami Hiramatsu 提交于 3月 05, 2012

Split out optprobe related code to arch/x86/kernel/kprobes-opt.c
for maintenanceability.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Suggested-by: NIngo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: systemtap@sourceware.org
Cc: anderson@redhat.com
Link: http://lkml.kernel.org/r/20120305133222.5982.54794.stgit@localhost.localdomain
[ Tidied up the code a tiny bit ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f33ab1c

17 2月, 2012 1 次提交

uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints · 2b144498

由 Srikar Dronamraju 提交于 2月 09, 2012

Add uprobes support to the core kernel, with x86 support.

This commit adds the kernel facilities, the actual uprobes
user-space ABI and perf probe support comes in later commits.

General design:

Uprobes are maintained in an rb-tree indexed by inode and offset
(the offset here is from the start of the mapping). For a unique
(inode, offset) tuple, there can be at most one uprobe in the
rb-tree.

Since the (inode, offset) tuple identifies a unique uprobe, more
than one user may be interested in the same uprobe. This provides
the ability to connect multiple 'consumers' to the same uprobe.

Each consumer defines a handler and a filter (optional). The
'handler' is run every time the uprobe is hit, if it matches the
'filter' criteria.

The first consumer of a uprobe causes the breakpoint to be
inserted at the specified address and subsequent consumers are
appended to this list.  On subsequent probes, the consumer gets
appended to the existing list of consumers. The breakpoint is
removed when the last consumer unregisters. For all other
unregisterations, the consumer is removed from the list of
consumers.

Given a inode, we get a list of the mms that have mapped the
inode. Do the actual registration if mm maps the page where a
probe needs to be inserted/removed.

We use a temporary list to walk through the vmas that map the
inode.

- The number of maps that map the inode, is not known before we
  walk the rmap and keeps changing.
- extending vm_area_struct wasn't recommended, it's a
  size-critical data structure.
- There can be more than one maps of the inode in the same mm.

We add callbacks to the mmap methods to keep an eye on text vmas
that are of interest to uprobes.  When a vma of interest is mapped,
we insert the breakpoint at the right address.

Uprobe works by replacing the instruction at the address defined
by (inode, offset) with the arch specific breakpoint
instruction. We save a copy of the original instruction at the
uprobed address.

This is needed for:

 a. executing the instruction out-of-line (xol).
 b. instruction analysis for any subsequent fixups.
 c. restoring the instruction back when the uprobe is unregistered.

We insert or delete a breakpoint instruction, and this
breakpoint instruction is assumed to be the smallest instruction
available on the platform. For fixed size instruction platforms
this is trivially true, for variable size instruction platforms
the breakpoint instruction is typically the smallest (often a
single byte).

Writing the instruction is done by COWing the page and changing
the instruction during the copy, this even though most platforms
allow atomic writes of the breakpoint instruction. This also
mirrors the behaviour of a ptrace() memory write to a PRIVATE
file map.

The core worker is derived from KSM's replace_page() logic.

In essence, similar to KSM:

 a. allocate a new page and copy over contents of the page that
    has the uprobed vaddr
 b. modify the copy and insert the breakpoint at the required
    address
 c. switch the original page with the copy containing the
    breakpoint
 d. flush page tables.

replace_page() is being replicated here because of some minor
changes in the type of pages and also because Hugh Dickins had
plans to improve replace_page() for KSM specific work.

Instruction analysis on x86 is based on instruction decoder and
determines if an instruction can be probed and determines the
necessary fixups after singlestep.  Instruction analysis is done
at probe insertion time so that we avoid having to repeat the
same analysis every time a probe is hit.

A lot of code here is due to the improvement/suggestions/inputs
from Peter Zijlstra.

Changelog:

(v10):
 - Add code to clear REX.B prefix as suggested by Denys Vlasenko
   and Masami Hiramatsu.

(v9):
 - Use insn_offset_modrm as suggested by Masami Hiramatsu.

(v7):

 Handle comments from Peter Zijlstra:

 - Dont take reference to inode. (expect inode to uprobe_register to be sane).
 - Use PTR_ERR to set the return value.
 - No need to take reference to inode.
 - use PTR_ERR to return error value.
 - register and uprobe_unregister share code.

(v5):

 - Modified del_consumer as per comments from Peter.
 - Drop reference to inode before dropping reference to uprobe.
 - Use i_size_read(inode) instead of inode->i_size.
 - Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
 - Includes errno.h as recommended by Stephen Rothwell to fix a build issue
   on sparc defconfig
 - Remove restrictions while unregistering.
 - Earlier code leaked inode references under some conditions while
   registering/unregistering.
 - Continue the vma-rmap walk even if the intermediate vma doesnt
   meet the requirements.
 - Validate the vma found by find_vma before inserting/removing the
   breakpoint
 - Call del_consumer under mutex_lock.
 - Use hash locks.
 - Handle mremap.
 - Introduce find_least_offset_node() instead of close match logic in
   find_uprobe
 - Uprobes no more depends on MM_OWNER; No reference to task_structs
   while inserting/removing a probe.
 - Uses read_mapping_page instead of grab_cache_page so that the pages
   have valid content.
 - pass NULL to get_user_pages for the task parameter.
 - call SetPageUptodate on the new page allocated in write_opcode.
 - fix leaking a reference to the new page under certain conditions.
 - Include Instruction Decoder if Uprobes gets defined.
 - Remove const attributes for instruction prefix arrays.
 - Uses mm_context to know if the application is 32 bit.
Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Also-written-by: NJim Keniston <jkenisto@us.ibm.com>
Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
[ Made various small edits to the commit log ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2b144498

05 12月, 2011 1 次提交

x86, NMI: Add NMI IPI selftest · 99e8b9ca

由 Don Zickus 提交于 10月 13, 2011

The previous patch modified the stop cpus path to use NMI
instead of IRQ as the way to communicate to the other cpus to
shutdown.  There were some concerns that various machines may
have problems with using an NMI IPI.

This patch creates a selftest to check if NMI is working at
boot. The idea is to help catch any issues before the machine
panics and we learn the hard way.

Loosely based on the locking-selftest.c file, this separate file
runs a couple of simple tests and reports the results.  The
output looks like:

...
Brought up 4 CPUs
----------------
| NMI testsuite:
--------------------
  remote IPI:  ok  |
   local IPI:  ok  |
--------------------
Good, all   2 testcases passed! |
---------------------------------
Total of 4 processors activated (21330.61 BogoMIPS).
...
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: seiji.aguchi@hds.com
Cc: vgoyal@redhat.com
Cc: mjg@redhat.com
Cc: tony.luck@intel.com
Cc: gong.chen@intel.com
Cc: satoru.moriya@hds.com
Cc: avi@redhat.com
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/1318533267-18880-3-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

99e8b9ca

18 11月, 2011 1 次提交

x86: Generate system call tables and unistd_*.h from tables · 303395ac

由 H. Peter Anvin 提交于 11月 11, 2011

Generate system call tables and unistd_*.h automatically from the
tables in arch/x86/syscalls.  All other information, like NR_syscalls,
is auto-generated, some of which is in asm-offsets_*.c.

This allows us to keep all the system call information in one place,
and allows for kernel space and user space to see different
information; this is currently used for the ia32 system call numbers
when building the 64-bit kernel, but will be used by the x32 ABI in
the near future.

This also removes some gratuitious differences between i386, x86-64
and ia32; in particular, now all system call tables are generated with
the same mechanism.

Cc: H. J. Lu <hjl.tools@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

303395ac

10 10月, 2011 1 次提交

x86, nmi: Split out nmi from traps.c · 1d48922c

由 Don Zickus 提交于 9月 30, 2011

The nmi stuff is changing a lot and adding more functionality. Split it
out from the traps.c file so it doesn't continue to pollute that file.

This makes it easier to find and expand all the future nmi related work.

No real functional changes here.
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1d48922c

11 8月, 2011 1 次提交

x86: Remove unnecessary compile flag tweaks for vsyscall code · f3fb5b7b

由 Andy Lutomirski 提交于 8月 10, 2011

As of commit 98d0ac38
Author: Andy Lutomirski <luto@mit.edu>
Date:   Thu Jul 14 06:47:22 2011 -0400

    x86-64: Move vread_tsc and vread_hpet into the vDSO

user code no longer directly calls into code in arch/x86/kernel/, so
we don't need compile flag hacks to make it safe.  All vdso code is
in the vdso directory now.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/835cd05a4c7740544d09723d6ba48f4406f9826c.1312988155.git.luto@mit.eduSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f3fb5b7b

15 7月, 2011 1 次提交

x86-64: Move vread_tsc and vread_hpet into the vDSO · 98d0ac38

由 Andy Lutomirski 提交于 7月 14, 2011

The vsyscall page now consists entirely of trap instructions.

Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.eduSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

98d0ac38

21 6月, 2011 2 次提交

iommu/amd: Move missing parts to drivers/iommu · 403f81d8

由 Joerg Roedel 提交于 6月 14, 2011

A few parts of the driver were missing in drivers/iommu.
Move them there to have the complete driver in that
directory.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

403f81d8

x86: amd_iommu: move to drivers/iommu/ · 29b68415

由 Ohad Ben-Cohen 提交于 6月 05, 2011

This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.

Compile-tested on x86_64.
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

29b68415

07 6月, 2011 1 次提交

x86-64: Emulate legacy vsyscalls · 5cec93c2

由 Andy Lutomirski 提交于 6月 05, 2011

There's a fair amount of code in the vsyscall page.  It contains
a syscall instruction (in the gettimeofday fallback) and who
knows what will happen if an exploit jumps into the middle of
some other code.

Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real
vsyscalls. These incantations are useless if entered in the
middle.

This causes vsyscalls to be a little more expensive than real
syscalls.  Fortunately sensible programs don't use them.
The only exception is time() which is still called by glibc
through the vsyscall - but calling time() millions of times
per second is not sensible. glibc has this fixed in the
development tree.

This patch is not perfect: the vread_tsc and vread_hpet
functions are still at a fixed address.  Fixing that might
involve making alternative patching work in the vDSO.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
[ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cec93c2

28 5月, 2011 1 次提交

x86: Put back -pg to tsc.o and add no GCOV to vread_tsc_64.o · 89e1be50

由 Steven Rostedt 提交于 5月 27, 2011

The commit 44259b1a
    Author: Andy Lutomirski <luto@MIT.EDU>
    x86-64: Move vread_tsc into a new file with sensible options

Removed the -pg from tsc.o which caused the function graph tracer
to go into an infinite function call recursion as it uses the tsc
internally outside its recursion protection, thus tracing the tsc
breaks the function graph tracer.

This commit also added the file vread_tsc_64.c that gets used
by vdso but failed to prevent GCOV from monkeying with it,
causing userspace to try to access kernel data when GCOV was
enabled.

Thanks to Thomas Gleixner for pointing out GCOV as the likely
culprit that added strange kernel accesses into the vread_tsc()
call.

Cc: Author: Andy Lutomirski <luto@MIT.EDU>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

89e1be50

24 5月, 2011 1 次提交

x86-64: Move vread_tsc into a new file with sensible options · 44259b1a

由 Andy Lutomirski 提交于 5月 23, 2011

vread_tsc is short and hot, and it's userspace code so the usual
reasons to enable -pg and turn off sibling calls don't apply.

(OK, turning off sibling calls has no effect.  But it might
someday...)

As an added benefit, tsc.c is profilable now.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C99c6d7f5efa3ccb65b4ac6eb443e1ab7bad47d7b.1306156808.git.luto%40mit.edu%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>

44259b1a

openanolis / cloud-kernel 12 个月 前同步成功

openanolis / cloud-kernel
12 个月前同步成功