提交 · e898c6706869fdcbd68b1e7fb0ac7461d98710fe · openeuler / raspberrypi-kernel

10 3月, 2012 1 次提交

x86: Use enum instead of literals for trap values · c9408265

由 Kees Cook 提交于 3月 09, 2012

The traps are referred to by their numbers and it can be difficult to
understand them while reading the code without context. This patch adds
enumeration of the trap numbers and replaces the numbers with the correct
enum for x86.
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20120310000710.GA32667@www.outflux.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

c9408265

06 3月, 2012 1 次提交

x32: Add ptrace for x32 · 55283e25

由 H.J. Lu 提交于 3月 05, 2012

X32 ptrace is a hybrid of 64bit ptrace and compat ptrace with 32bit
address and longs.  It use 64bit ptrace to access the full 64bit
registers.  PTRACE_PEEKUSR and PTRACE_POKEUSR are only allowed to access
segment and debug registers.  PTRACE_PEEKUSR returns the lower 32bits
and PTRACE_POKEUSR zero-extends 32bit value to 64bit.   It works since
the upper 32bits of segment and debug registers of x32 process are always
zero.  GDB only uses PTRACE_PEEKUSR and PTRACE_POKEUSR to access
segment and debug registers.

[ hpa: changed TIF_X32 test to use !is_ia32_task() instead, and moved
  the system call number to the now-unused 521 slot. ]
Signed-off-by: N"H.J. Lu" <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com

55283e25

02 3月, 2012 2 次提交

perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled · 1018faa6

由 Joerg Roedel 提交于 2月 29, 2012

It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.

This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.

The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Avi Kivity <avi@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: stable@vger.kernel.org # v3.2
Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1018faa6

x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls · b263b31e

由 H. Peter Anvin 提交于 2月 27, 2012

Specify the data structures for the 64-bit ioctls with explicit sizing
and padding so that the x32 kernel will correctly use the 64-bit forms
of these ioctls.  Note that these ioctls are bogus in both forms on
both 32 and 64 bits; even on 64 bits the maximum MTRR size is only 44
bits long.

Note that nothing really is supposed to use these ioctls and that the
preferred interface is text strings on /proc/mtrr, or better yet,
nothing at all (use /sys/bus/pci/devices/*/resource*_wc for write
combining; that uses PAT not MTRRs.)
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Tested-by: NNitin A. Kamble <nitin.a.kamble@intel.com>
Link: http://lkml.kernel.org/n/tip-vwvnlu3hjmtkwvij4qxtm90l@git.kernel.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

b263b31e

26 2月, 2012 2 次提交

x32: Only clear TIF_X32 flag once · 00194b2e

由 Bobby Powers 提交于 2月 25, 2012

Commits bb212724 and d1a797f3 both added a call to
clear_thread_flag(TIF_X32) under set_personality_64bit() - only one is
needed.
Signed-off-by: NBobby Powers <bobbypowers@gmail.com>
Link: http://lkml.kernel.org/r/1330228774-24223-1-git-send-email-bobbypowers@gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

00194b2e

x32: Make sure TS_COMPAT is cleared for x32 tasks · ce5f7a99

由 Bobby Powers 提交于 2月 25, 2012

If a process has a non-x32 ia32 personality and changes to x32, the
process would keep its TS_COMPAT flag. x32 uses the presence of the
x32 flag on a syscall to determine compat status, so make sure
TS_COMPAT is cleared.
Signed-off-by: NBobby Powers <bobbypowers@gmail.com>
Link: http://lkml.kernel.org/r/1330230338-25077-1-git-send-email-bobbypowers@gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

ce5f7a99

22 2月, 2012 1 次提交

x86/mce/AMD: Fix UP build error · 3f806e50

由 Borislav Petkov 提交于 2月 03, 2012

141168c3 ("x86: Simplify code by removing a !SMP #ifdefs
from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs
around code touching struct cpuinfo_x86 members but also caused
the following build error with Randy's randconfigs:

mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map'

Restore the #ifdef in threshold_create_bank() which creates
symlinks on the non-BSP CPUs.

There's a better patch series being worked on by Kevin Winchester
which will solve this in a cleaner fashion, but that series is
too ambitious for v3.3 merging - so we first queue up this trivial
fix and then do the rest for v3.4.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
Acked-by: NKevin Winchester <kjwinchester@gmail.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Nick Bowler <nbowler@elliptictech.com>
Link: http://lkml.kernel.org/r/20120203191801.GA2846@x1.osrc.amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

3f806e50

21 2月, 2012 14 次提交

i387: export 'fpu_owner_task' per-cpu variable · 27e74da9

由 Linus Torvalds 提交于 2月 20, 2012

(And define it properly for x86-32, which had its 'current_task'
declaration in separate from x86-64)

Bitten by my dislike for modules on the machines I use, and the fact
that apparently nobody else actually wanted to test the patches I sent
out.

Snif. Nobody else cares.

Anyway, we probably should uninline the 'kernel_fpu_begin()' function
that is what modules actually use and that references this, but this is
the minimal fix for now.
Reported-by: NJosh Boyer <jwboyer@gmail.com>
Reported-and-tested-by: NJongman Heo <jongman.heo@samsung.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27e74da9

x86: Specify a size for the cmp in the NMI handler · a38449ef

由 Steven Rostedt 提交于 2月 20, 2012

Linus noticed that the cmp used to check if the code segment is
__KERNEL_CS or not did not specify a size. Perhaps it does not matter
as H. Peter Anvin noted that user space can not set the bottom two
bits of the %cs register. But it's best not to let the assembly choose
and change things between different versions of gas, but instead just
pick the size.

Four bytes are used to compare the saved code segment against
__KERNEL_CS. Perhaps this might mess up Xen, but we can fix that when
the time comes.

Also I noticed that there was another non-specified cmp that checks
the special stack variable if it is 1 or 0. This too probably doesn't
matter what cmp is used, but this patch uses cmpl just to make it non
ambiguous.

Link: http://lkml.kernel.org/r/CA+55aFxfAn9MWRgS3O5k2tqN5ys1XrhSFVO5_9ZAoZKDVgNfGA@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a38449ef

x32: If configured, add x32 system calls to system call tables · a06c9bc0

由 H. Peter Anvin 提交于 2月 19, 2012

If CONFIG_X86_X32_ABI is defined, add the x32 system calls to the
system call tables.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

a06c9bc0

x32: Handle process creation · d1a797f3

由 H. Peter Anvin 提交于 2月 19, 2012

Allow an x32 process to be started.
Originally-by: NH. J. Lu <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>

d1a797f3

x32: Signal-related system calls · c5a37394

由 H. Peter Anvin 提交于 2月 19, 2012

x32 uses the 64-bit signal frame format, obviously, but there are some
structures which mixes that with pointers or sizeof(long) types, as
such we have to create a handful of system calls specific to x32.  By
and large these are a mixture of the 64-bit and the compat system
calls.
Originally-by: NH. J. Lu <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c5a37394

x32: Handle the x32 system call flag · fca460f9

由 H. Peter Anvin 提交于 2月 19, 2012

x32 shares most system calls with x86-64, but unfortunately some
subsystem (the input subsystem is the chief offender) which require
is_compat() when operating with a 32-bit userspace.  The input system
actually has text files in sysfs whose meaning is dependent on
sizeof(long) in userspace!

We could solve this by having two completely disjoint system call
tables; requiring that each system call be duplicated.  This patch
takes a different approach: we add a flag to the system call number;
this flag doesn't affect the system call dispatch but requests compat
treatment from affected subsystems for the duration of the system call.

The change of cmpq to cmpl is safe since it immediately follows the
and.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

fca460f9

x32: Export setup/restore_sigcontext from signal.c · 85139422

由 H. Peter Anvin 提交于 2月 19, 2012

Export setup_sigcontext() and restore_sigcontext() from signal.c, so
we can use the 64-bit versions verbatim for x32.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

85139422

x86: Move some signal-handling definitions to a common header · f28f0c23

由 H. Peter Anvin 提交于 2月 19, 2012

There are some definitions which are duplicated between
kernel/signal.c and ia32/ia32_signal.c; move them to a common header
file.

Rather than adding stuff to existing header files which contain data
structures, create a new header file; hence the slightly odd name
("all the good ones were taken.")

Note: nothing relied on signal_fault() being defined in
<asm/ptrace.h>.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f28f0c23

x32: Add x32 system calls to syscall/syscall_64.tbl · 6630f11b

由 H. Peter Anvin 提交于 2月 14, 2012

Split the 64-bit system calls into "64" (64-bit only) and "common"
(64-bit or x32) and add the x32 system call numbers.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

6630f11b

x32: Add a thread flag for x32 processes · bb212724

由 H. Peter Anvin 提交于 2月 14, 2012

An x32 process is *almost* the same thing as a 64-bit process with a
32-bit address limit, but there are a few minor differences -- in
particular core dumps are 32 bits and signal handling is different.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

bb212724

x86: Factor out TIF_IA32 from 32-bit address space · 6bd33008

由 H. Peter Anvin 提交于 2月 06, 2012

Factor out IA32 (compatibility instruction set) from 32-bit address
space in the thread_info flags; this is a precondition patch for x32
support.
Originally-by: NH. J. Lu <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-4pr1xnnksprt7t0h3w5fw4rv@git.kernel.org

6bd33008

i387: support lazy restore of FPU state · 7e16838d

由 Linus Torvalds 提交于 2月 19, 2012

This makes us recognize when we try to restore FPU state that matches
what we already have in the FPU on this CPU, and avoids the restore
entirely if so.

To do this, we add two new data fields:

 - a percpu 'fpu_owner_task' variable that gets written any time we
   update the "has_fpu" field, and thus acts as a kind of back-pointer
   to the task that owns the CPU.  The exception is when we save the FPU
   state as part of a context switch - if the save can keep the FPU
   state around, we leave the 'fpu_owner_task' variable pointing at the
   task whose FP state still remains on the CPU.

 - a per-thread 'last_cpu' field, that indicates which CPU that thread
   used its FPU on last.  We update this on every context switch
   (writing an invalid CPU number if the last context switch didn't
   leave the FPU in a lazily usable state), so we know that *that*
   thread has done nothing else with the FPU since.

These two fields together can be used when next switching back to the
task to see if the CPU still matches: if 'fpu_owner_task' matches the
task we are switching to, we know that no other task (or kernel FPU
usage) touched the FPU on this CPU in the meantime, and if the current
CPU number matches the 'last_cpu' field, we know that this thread did no
other FP work on any other CPU, so the FPU state on the CPU must match
what was saved on last context switch.

In that case, we can avoid the 'f[x]rstor' entirely, and just clear the
CR0.TS bit.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e16838d

i387: use 'restore_fpu_checking()' directly in task switching code · 80ab6f1e

由 Linus Torvalds 提交于 2月 19, 2012

This inlines what is usually just a couple of instructions, but more
importantly it also fixes the theoretical error case (can that FPU
restore really ever fail? Maybe we should remove the checking).

We can't start sending signals from within the scheduler, we're much too
deep in the kernel and are holding the runqueue lock etc.  So don't
bother even trying.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80ab6f1e

i387: fix up some fpu_counter confusion · cea20ca3

由 Linus Torvalds 提交于 2月 20, 2012

This makes sure we clear the FPU usage counter for newly created tasks,
just so that we start off in a known state (for example, don't try to
preload the FPU state on the first task switch etc).

It also fixes a thinko in when we increment the fpu_counter at task
switch time, introduced by commit 34ddc81a ("i387: re-introduce FPU
state preloading at context switch time").  We should increment the
*new* task fpu_counter, not the old task, and only if we decide to use
that state (whether lazily or preloaded).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cea20ca3

20 2月, 2012 1 次提交

x86/nmi: Test saved %cs in NMI to determine nested NMI case · 45d5a168

由 Steven Rostedt 提交于 2月 19, 2012

Currently, the NMI handler tests if it is nested by checking the
special variable saved on the stack (set during NMI handling)
and whether the saved stack is the NMI stack as well (to prevent
the race when the variable is set to zero).

But userspace may set their %rsp to any value as long as they do
not derefence it, and it may make it point to the NMI stack,
which will prevent NMIs from triggering while the userspace app
is running. (I tested this, and it is indeed the case)

Add another check to determine nested NMIs by looking at the
saved %cs (code segment register) and making sure that it is the
kernel code segment.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1329687817.1561.27.camel@acer.local.homeSigned-off-by: NIngo Molnar <mingo@elte.hu>

45d5a168

19 2月, 2012 2 次提交

i387: re-introduce FPU state preloading at context switch time · 34ddc81a

由 Linus Torvalds 提交于 2月 18, 2012

After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870e ("i387:
do not preload FPU state at task switch time").

However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably

 - properly abstracted as a true FPU state switch, rather than as
   open-coded save and restore with various hacks.

   In particular, implementing it as a proper FPU state switch allows us
   to optimize the CR0.TS flag accesses: there is no reason to set the
   TS bit only to then almost immediately clear it again.  CR0 accesses
   are quite slow and expensive, don't flip the bit back and forth for
   no good reason.

 - Make sure that the same model works for both x86-32 and x86-64, so
   that there are no gratuitous differences between the two due to the
   way they save and restore segment state differently due to
   architectural differences that really don't matter to the FPU state.

 - Avoid exposing the "preload" state to the context switch routines,
   and in particular allow the concept of lazy state restore: if nothing
   else has used the FPU in the meantime, and the process is still on
   the same CPU, we can avoid restoring state from memory entirely, just
   re-expose the state that is still in the FPU unit.

   That optimized lazy restore isn't actually implemented here, but the
   infrastructure is set up for it.  Of course, older CPU's that use
   'fnsave' to save the state cannot take advantage of this, since the
   state saving also trashes the state.

In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage.  Hopefully it's easier to
follow as a result.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

34ddc81a

i387: move TS_USEDFPU flag from thread_info to task_struct · f94edacf

由 Linus Torvalds 提交于 2月 17, 2012

This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

 - changing 'thread_info->status' from the scheduler causes nasty
   problems for the other users of that variable, since it is defined to
   be thread-synchronous (that's what the "TS_" part of the naming was
   supposed to indicate).

   So perfectly valid code could (and did) do

	ti->status |= TS_RESTORE_SIGMASK;

   and the compiler was free to do that as separate load, or and store
   instructions.  Which can cause problems with preemption, since a task
   switch could happen in between, and change the TS_USEDFPU bit. The
   change to TS_USEDFPU would be overwritten by the final store.

   In practice, this seldom happened, though, because the 'status' field
   was seldom used more than once, so gcc would generally tend to
   generate code that used a read-modify-write instruction and thus
   happened to avoid this problem - RMW instructions are naturally low
   fat and preemption-safe.

 - On x86-32, the current_thread_info() pointer would, during interrupts
   and softirqs, point to a *copy* of the real thread_info, because
   x86-32 uses %esp to calculate the thread_info address, and thus the
   separate irq (and softirq) stacks would cause these kinds of odd
   thread_info copy aliases.

   This is normally not a problem, since interrupts aren't supposed to
   look at thread information anyway (what thread is running at
   interrupt time really isn't very well-defined), but it confused the
   heck out of irq_fpu_usable() and the code that tried to squirrel
   away the FPU state.

   (It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling).  And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: NRaphael Prevost <raphael@buro.asia>
Acked-and-tested-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Tested-by: NPeter Anvin <hpa@zytor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f94edacf

17 2月, 2012 4 次提交

i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore · 4903062b

由 Linus Torvalds 提交于 2月 16, 2012

The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending.  In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state.  That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.

We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it

 (a) corrupts what is potentially perfectly good FPU state that we might
     want to lazy avoid restoring later and

 (b) on x86-64 it resulted in a very annoying ordering constraint, where
     "__unlazy_fpu()" in the task switch needs to be delayed until after
     the DS segment has been reloaded just to get the new DS value.

Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used).  It's
simply a much more natural place for the leaked state cleanup.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4903062b

i387: do not preload FPU state at task switch time · b3b0870e

由 Linus Torvalds 提交于 2月 16, 2012

Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code. And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.

Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better. If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.

In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has. For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3b0870e

i387: don't ever touch TS_USEDFPU directly, use helper functions · 6d59d7a9

由 Linus Torvalds 提交于 2月 16, 2012

This creates three helper functions that do the TS_USEDFPU accesses, and
makes everybody that used to do it by hand use those helpers instead.

In addition, there's a couple of helper functions for the "change both
CR0.TS and TS_USEDFPU at the same time" case, and the places that do
that together have been changed to use those. That means that we have
fewer random places that open-code this situation.

The intent is partly to clarify the code without actually changing any
semantics yet (since we clearly still have some hard to reproduce bug in
this area), but also to make it much easier to use another approach
entirely to caching the CR0.TS bit for software accesses.

Right now we use a bit in the thread-info 'status' variable (this patch
does not change that), but we might want to make it a full field of its
own or even make it a per-cpu variable.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d59d7a9

i387: fix x86-64 preemption-unsafe user stack save/restore · 15d8791c

由 Linus Torvalds 提交于 2月 16, 2012

Commit 5b1cbac3 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.

However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.

Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.

This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid.  With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.

There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.

However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.

Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.

Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

15d8791c

14 2月, 2012 2 次提交

i387: make irq_fpu_usable() tests more robust · 5b1cbac3

由 Linus Torvalds 提交于 2月 13, 2012

Some code - especially the crypto layer - wants to use the x86
FP/MMX/AVX register set in what may be interrupt (typically softirq)
context.

That *can* be ok, but the tests for when it was ok were somewhat
suspect.  We cannot touch the thread-specific status bits either, so
we'd better check that we're not going to try to save FP state or
anything like that.

Now, it may be that the TS bit is always cleared *before* we set the
USEDFPU bit (and only set when we had already cleared the USEDFP
before), so the TS bit test may actually have been sufficient, but it
certainly was not obviously so.

So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks.  Because it seems that somehow
AES-NI is corrupting user FP state.  The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b1cbac3

i387: math_state_restore() isn't called from asm · be98c2cd

由 Linus Torvalds 提交于 2月 13, 2012

It was marked asmlinkage for some really old and stale legacy reasons.
Fix that and the equally stale comment.

Noticed when debugging the irq_fpu_usable() bugs.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be98c2cd

09 2月, 2012 1 次提交

x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors · 32c32338

由 Andreas Herrmann 提交于 2月 08, 2012

For L1 instruction cache and L2 cache the shared CPU information
is wrong. On current AMD family 15h CPUs those caches are shared
between both cores of a compute unit.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=42607Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Cc: Petkov Borislav <Borislav.Petkov@amd.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20120208195229.GA17523@alberich.amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

32c32338

07 2月, 2012 2 次提交

perf: Fix double start/stop in x86_pmu_start() · f39d47ff

由 Stephane Eranian 提交于 2月 07, 2012

The following patch fixes a bug introduced by the following
commit:

        e050e3f0 ("perf: Fix broken interrupt rate throttling")

The patch caused the following warning to pop up depending on
the sampling frequency adjustments:

  ------------[ cut here ]------------
  WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()

It was caused by the following call sequence:

perf_adjust_freq_unthr_context.part() {
     stop()
     if (delta > 0) {
          perf_adjust_period() {
              if (period > 8*...) {
                  stop()
                  ...
                  start()
              }
          }
      }
      start()
}

Which caused a double start and a double stop, thus triggering
the assert in x86_pmu_start().

The patch fixes the problem by avoiding the double calls. We
pass a new argument to perf_adjust_period() to indicate whether
or not the event is already stopped. We can't just remove the
start/stop from that function because it's called from
__perf_event_overflow where the event needs to be reloaded via a
stop/start back-toback call.

The patch reintroduces the assertion in x86_pmu_start() which
was removed by commit:

	84f2b9b2 ("perf: Remove deprecated WARN_ON_ONCE()")

In this second version, we've added calls to disable/enable PMU
during unthrottling or frequency adjustment based on bug report
of spurious NMI interrupts from Eric Dumazet.
Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: markus@trippelsdorf.de
Cc: paulus@samba.org
Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad
[ Minor edits to the changelog and to the code ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f39d47ff

x86/microcode: Remove noisy AMD microcode warning · c1d2f1bc

由 Prarit Bhargava 提交于 2月 06, 2012

AMD processors will never support /dev/cpu/microcode updating so
just silently fail instead of printing out a warning for every
cpu.
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1328552935-965-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

c1d2f1bc

03 2月, 2012 1 次提交

perf: Remove deprecated WARN_ON_ONCE() · 84f2b9b2

由 Stephane Eranian 提交于 2月 02, 2012

With the new throttling/unthrottling code introduced with
commit:

  e050e3f0 ("perf: Fix broken interrupt rate throttling")

we occasionally hit two WARN_ON_ONCE() checks in:

  - intel_pmu_pebs_enable()
  - intel_pmu_lbr_enable()
  - x86_pmu_start()

The assertions are no longer problematic. There is a valid
path where they can trigger but it is harmless.

The assertion can be triggered with:

  $ perf record -e instructions:pp ....

Leading to paths:

  intel_pmu_pebs_enable
  intel_pmu_enable_event
  x86_perf_event_set_period
  x86_pmu_start
  perf_adjust_freq_unthr_context
  perf_event_task_tick
  scheduler_tick

And:

  intel_pmu_lbr_enable
  intel_pmu_enable_event
  x86_perf_event_set_period
  x86_pmu_start
  perf_adjust_freq_unthr_context.
  perf_event_task_tick
  scheduler_tick

cpuc->enabled is always on because when we get to
perf_adjust_freq_unthr_context() the PMU is not totally
disabled. Furthermore when we need to adjust a period,
we only stop the event we need to change and not the
entire PMU. Thus, when we re-enable, cpuc->enabled is
already set. Note that when we stop the event, both
pebs and lbr are stopped if necessary (and possible).
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/20120202110401.GA30911@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

84f2b9b2

30 1月, 2012 2 次提交

x86/reboot: Remove VersaLogic Menlow reboot quirk · e6d36a65

由 Michael D Labriola 提交于 1月 29, 2012

This commit removes the reboot quirk originally added by commit
e19e074b ("x86: Fix reboot problem on VersaLogic Menlow boards").

Testing with a VersaLogic Ocelot (VL-EPMs-21a rev 1.00 w/ BIOS
6.5.102) revealed the following regarding the reboot hang
problem:

- v2.6.37 reboot=bios was needed.

- v2.6.38-rc1: behavior changed, reboot=acpi is needed,
  reboot=kbd and reboot=bios results in system hang.

- v2.6.38: VersaLogic patch (e19e074b "x86: Fix reboot problem on
  VersaLogic Menlow boards") was applied prior to v2.6.38-rc7.  This
  patch sets a quirk for VersaLogic Menlow boards that forces the use
  of reboot=bios, which doesn't work anymore.

- v3.2: It seems that commit 660e34ce ("x86: Reorder reboot method
  preferences") changed the default reboot method to acpi prior to
  v3.0-rc1, which means the default behavior is appropriate for the
  Ocelot.  No VersaLogic quirk is required.

The Ocelot board used for testing can successfully reboot w/out
having to pass any reboot= arguments for all 3 current versions
of the BIOS.
Signed-off-by: NMichael D Labriola <michael.d.labriola@gmail.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Michael D Labriola <mlabriol@gdeb.com>
Cc: Kushal Koolwal <kushalkoolwal@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/87vcnub9hu.fsf@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

e6d36a65

x86/reboot: Skip DMI checks if reboot set by user · 5955633e

由 Michael D Labriola 提交于 1月 29, 2012

Skip DMI checks for vendor specific reboot quirks if the user
passed in a reboot= arg on the command line - we should never
override user choices.
Signed-off-by: NMichael D Labriola <michael.d.labriola@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Michael D Labriola <mlabriol@gdeb.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/87wr8ab9od.fsf@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

5955633e

28 1月, 2012 1 次提交

x86/dumpstack: Remove unneeded check in dump_trace() · d0caf292

由 Dan Carpenter 提交于 1月 28, 2012

Smatch complains that we have some inconsistent NULL checking.

If "task" were NULL then it would lead to a NULL dereference
later. We can remove this test because earlier on in the
function we have:

 if (!task)
	task = current;
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Clemens Ladisch <clemens@ladisch.de>
Link: http://lkml.kernel.org/r/20120128105246.GA25092@elgon.mountainSigned-off-by: NIngo Molnar <mingo@elte.hu>

d0caf292

27 1月, 2012 1 次提交

bugs, x86: Fix printk levels for panic, softlockups and stack dumps · b0f4c4b3

由 Prarit Bhargava 提交于 1月 26, 2012

rsyslog will display KERN_EMERG messages on a connected
terminal.  However, these messages are useless/undecipherable
for a general user.

For example, after a softlockup we get:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Stack:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Call Trace:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89

This happens because the printk levels for these messages are
incorrect. Only an informational message should be displayed on
a terminal.

I modified the printk levels for various messages in the kernel
and tested the output by using the drivers/misc/lkdtm.c kernel
modules (ie, softlockups, panics, hard lockups, etc.) and
confirmed that the console output was still the same and that
the output to the terminals was correct.

For example, in the case of a softlockup we now see the much
more informative:

 Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
 BUG: soft lockup - CPU4 stuck for 60s!

instead of the above confusing messages.

AFAICT, the messages no longer have to be KERN_EMERG.  In the
most important case of a panic we set console_verbose().  As for
the other less severe cases the correct data is output to the
console and /var/log/messages.

Successfully tested by me using the drivers/misc/lkdtm.c module.
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Cc: dzickus@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b0f4c4b3

26 1月, 2012 1 次提交

x86/microcode_amd: Add support for CPU family specific container files · 5b68edc9

由 Andreas Herrmann 提交于 1月 20, 2012

We've decided to provide CPU family specific container files
(starting with CPU family 15h). E.g. for family 15h we have to
load microcode_amd_fam15h.bin instead of microcode_amd.bin

Rationale is that starting with family 15h patch size is larger
than 2KB which was hard coded as maximum patch size in various
microcode loaders (not just Linux).

Container files which include patches larger than 2KB cause
different kinds of trouble with such old patch loaders. Thus we
have to ensure that the default container file provides only
patches with size less than 2KB.
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20120120164412.GD24508@alberich.amd.com
[ documented the naming convention and tidied the code a bit. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5b68edc9

18 1月, 2012 1 次提交

x86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=n · 6015ff10

由 Al Viro 提交于 1月 18, 2012

JONGMAN HEO reports:

  With current linus git (commit a25a2b84), I got following build error,

  arch/x86/kernel/vm86_32.c: In function 'do_sys_vm86':
  arch/x86/kernel/vm86_32.c:340: error: implicit declaration of function '__audit_syscall_exit'
  make[3]: *** [arch/x86/kernel/vm86_32.o] Error 1

OK, I can reproduce it (32bit allmodconfig with AUDIT=y, AUDITSYSCALL=n)

It's due to commit d7e7528b: "Audit: push audit success and retcode
into arch ptrace.h".
Reported-by: NJONGMAN HEO <jongman.heo@samsung.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6015ff10