提交 · 3e9b2327b59801e677a7581fe4d2541ca749dcab · gsplhtlxg / clone-Linux

30 8月, 2013 1 次提交

x86, asm: Extend definitions of _ASM_* with a raw format · 3e9b2327

由 Jan-Simon Möller 提交于 8月 29, 2013

The __ASM_* macros (e.g. __ASM_DX) are used to return the proper
register name (e.g. edx for 32bit / rdx for 64bit). We want to use
this also in arch/x86/include/asm/uaccess.h / get_user() . For this
to work, we need a raw form as both gcc and clang choke on the
whitespace in a register asm() statement, and the __ASM_FORM macro
surrounds the argument with blanks. A new macro, __ASM_FORM_RAW was
added and we change __ASM_REG to use the new RAW form.
Signed-off-by: NJan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-2-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

3e9b2327

06 8月, 2013 1 次提交

x86, insn: Add new opcodes as of June, 2013 · 3e21bb09

由 Masami Hiramatsu 提交于 8月 06, 2013

Add TSX-NI related instructions and new instructions to
x86-opcode-map.txt according to the Intel(R) 64 and IA-32
Architectures Software Developer's Manual Vol2C (June, 2013).
This also includes below updates.
 - Fix a typo of MWAIT (the lack of (11B)).
 - Change NOP Ev to prefetchw Ev
 - Add CRC32 new prefix style (66&F2)
 - Add ADCX, ADOX, RDSEED, CLAC and STAC instructions
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20130806073750.4049.12365.stgit@udc4-manage.rcp.hitachi.co.jp
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

3e21bb09

23 7月, 2013 1 次提交

x86/ia32/asm: Remove unused argument in macro · d2475b8f

由 Ramkumar Ramachandra 提交于 7月 10, 2013

Commit 3fe26fa3 ("x86: get rid of pt_regs argument in sigreturn variants",
from 2012-11-12) changed the body of PTREGSCALL to drop arg, and
updated the callsites; unfortunately, it forgot to update the
macro argument list, leaving an unused argument. Fix this.
Signed-off-by: NRamkumar Ramachandra <artagnon@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1373479468-7175-1-git-send-email-artagnon@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d2475b8f

17 7月, 2013 1 次提交

x86, bitops: Change bitops to be native operand size · 9b710506

由 H. Peter Anvin 提交于 7月 16, 2013

Change the bitops operation to be naturally "long", i.e. 63 bits on
the 64-bit kernel.  Additional bugs are likely to crop up in the
future.

We already have bugs which machines with > 16 TiB of memory in a
single node, as can happen if memory is interleaved.  The x86 bitop
operations take a signed index, so using an unsigned type is not an
option.

Jim Kukunas measured the effect of this patch on kernel size: it adds
2779 bytes to the allyesconfig kernel.  Some of that probably could be
elided by replacing the inline functions with macros which select the
32-bit type if the index is a 32-bit value, something like:

In that case we could also use "Jr" constraints for the 64-bit
version.

However, this would more than double the amount of code for a
relatively small gain.

Note that we can't use ilog2() for _BITOPS_LONG_SHIFT, as that causes
a recursive header inclusion problem.

The change to constant_test_bit() should both generate better code and
give correct result for negative bit indicies.  As previously written
the compiler had to generate extra code to create the proper wrong
result for negative values.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Jim Kukunas <james.t.kukunas@intel.com>
Link: http://lkml.kernel.org/n/tip-z61ofiwe90xeyb461o72h8ya@git.kernel.org

9b710506

29 6月, 2013 1 次提交

x86: Use asm-goto to implement mutex fast path on x86-64 · 00e55a79

由 Wedson Almeida Filho 提交于 6月 28, 2013

The new implementation allows the compiler to better optimize the code; the
original implementation is still used when the kernel is compiled with older
versions of gcc that don't support asm-goto.

Compiling with gcc 4.7.3, the original mutex_lock() is 60 bytes with the fast
path taking 16 instructions; the new mutex_lock() is 42 bytes, with the fast
path taking 12 instructions.

The original mutex_unlock() is 24 bytes with the fast path taking 7
instructions; the new mutex_unlock() is 25 bytes (because the compiler used
a 2-byte ret) with the fast path taking 4 instructions.

The two versions of the functions are included below for reference.

Old:
ffffffff817742a0 <mutex_lock>:
ffffffff817742a0: 55 push %rbp
ffffffff817742a1: 48 89 e5 mov %rsp,%rbp
ffffffff817742a4: 48 83 ec 10 sub $0x10,%rsp
ffffffff817742a8: 48 89 5d f0 mov %rbx,-0x10(%rbp)
ffffffff817742ac: 48 89 fb mov %rdi,%rbx
ffffffff817742af: 4c 89 65 f8 mov %r12,-0x8(%rbp)
ffffffff817742b3: e8 28 15 00 00 callq ffffffff817757e0 <_cond_resched>
ffffffff817742b8: 48 89 df mov %rbx,%rdi
ffffffff817742bb: f0 ff 0f lock decl (%rdi)
ffffffff817742be: 79 05 jns ffffffff817742c5 <mutex_lock+0x25>
ffffffff817742c0: e8 cb 04 00 00 callq ffffffff81774790 <__mutex_lock_slowpath>
ffffffff817742c5: 65 48 8b 04 25 c0 b7 mov %gs:0xb7c0,%rax
ffffffff817742cc: 00 00
ffffffff817742ce: 4c 8b 65 f8 mov -0x8(%rbp),%r12
ffffffff817742d2: 48 89 43 18 mov %rax,0x18(%rbx)
ffffffff817742d6: 48 8b 5d f0 mov -0x10(%rbp),%rbx
ffffffff817742da: c9 leaveq
ffffffff817742db: c3 retq

ffffffff81774250 <mutex_unlock>:
ffffffff81774250: 55 push %rbp
ffffffff81774251: 48 c7 47 18 00 00 00 movq $0x0,0x18(%rdi)
ffffffff81774258: 00
ffffffff81774259: 48 89 e5 mov %rsp,%rbp
ffffffff8177425c: f0 ff 07 lock incl (%rdi)
ffffffff8177425f: 7f 05 jg ffffffff81774266 <mutex_unlock+0x16>
ffffffff81774261: e8 ea 04 00 00 callq ffffffff81774750 <__mutex_unlock_slowpath>
ffffffff81774266: 5d pop %rbp
ffffffff81774267: c3 retq

New:
ffffffff81774920 <mutex_lock>:
ffffffff81774920: 55 push %rbp
ffffffff81774921: 48 89 e5 mov %rsp,%rbp
ffffffff81774924: 53 push %rbx
ffffffff81774925: 48 89 fb mov %rdi,%rbx
ffffffff81774928: e8 a3 0e 00 00 callq ffffffff817757d0 <_cond_resched>
ffffffff8177492d: f0 ff 0b lock decl (%rbx)
ffffffff81774930: 79 08 jns ffffffff8177493a <mutex_lock+0x1a>
ffffffff81774932: 48 89 df mov %rbx,%rdi
ffffffff81774935: e8 16 fe ff ff callq ffffffff81774750 <__mutex_lock_slowpath>
ffffffff8177493a: 65 48 8b 04 25 c0 b7 mov %gs:0xb7c0,%rax
ffffffff81774941: 00 00
ffffffff81774943: 48 89 43 18 mov %rax,0x18(%rbx)
ffffffff81774947: 5b pop %rbx
ffffffff81774948: 5d pop %rbp
ffffffff81774949: c3 retq

ffffffff81774730 <mutex_unlock>:
ffffffff81774730: 48 c7 47 18 00 00 00 movq $0x0,0x18(%rdi)
ffffffff81774737: 00
ffffffff81774738: f0 ff 07 lock incl (%rdi)
ffffffff8177473b: 7f 0a jg ffffffff81774747 <mutex_unlock+0x17>
ffffffff8177473d: 55 push %rbp
ffffffff8177473e: 48 89 e5 mov %rsp,%rbp
ffffffff81774741: e8 aa ff ff ff callq ffffffff817746f0 <__mutex_unlock_slowpath>
ffffffff81774746: 5d pop %rbp
ffffffff81774747: f3 c3 repz retq
Signed-off-by: NWedson Almeida Filho <wedsonaf@gmail.com>
Link: http://lkml.kernel.org/r/1372420245-60021-1-git-send-email-wedsonaf@gmail.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

00e55a79

26 6月, 2013 4 次提交

x86, asm, cleanup: Replace open-coded control register values with symbolic · a3d7b7dd

由 H. Peter Anvin 提交于 4月 27, 2013

Clean up an unnecessary open-coded control register values.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org

a3d7b7dd

x86, processor-flags: Fix the datatypes and add bit number defines · d1fbefcb

由 H. Peter Anvin 提交于 4月 27, 2013

The control registers are unsigned long (32 bits on i386, 64 bits on
x86-64), and so make that manifest in the data type for the various
constants. Add defines with a _BIT suffix which defines the bit
number, as opposed to the bit mask.

This should resolve some issues with ~bitmask that Linus discovered.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-cwckhbrib2aux1qbteaebij0@git.kernel.org

d1fbefcb

x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE · afcbf13f

由 H. Peter Anvin 提交于 4月 27, 2013

Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE to match the SDM.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Link: http://lkml.kernel.org/n/tip-buq1evi5dpykxx7ak6amaam0@git.kernel.org

afcbf13f

x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED · 1adfa76a

由 H. Peter Anvin 提交于 4月 27, 2013

Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
actually tries to explain what it is all about, rather than being a
tautology.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org

1adfa76a

19 6月, 2013 1 次提交

x86/vdso: Convert use of typedef ctl_table to struct ctl_table · f07d91ed

由 Joe Perches 提交于 6月 13, 2013

This typedef is unnecessary and should just be removed.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/a756fa0060e8eea25e8c1863c2764e86c2823617.1371177118.git.joe@perches.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f07d91ed

31 5月, 2013 1 次提交

x86: __force_order doesn't need to be an actual variable · 1d10f6ee

由 Jan Beulich 提交于 5月 29, 2013

It being static causes over a dozen instances to be scattered
across the kernel image, with non of them ever being referenced
in any way. Making the variable extern without ever defining it
works as well - all we need is to have the compiler think the
variable is being accessed.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1d10f6ee

28 5月, 2013 1 次提交

crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations · de614e56

由 Jussi Kivilinna 提交于 5月 21, 2013

The _XFER stack element size was set too small, 8 bytes, when it needs to be
16 bytes. As _XFER is the last stack element used by these implementations,
the 16 byte stores with 'movdqa' corrupt the stack where the value of register
%r12 is temporarily stored. As these implementations align the stack pointer
to 16 bytes, this corruption did not happen every time.

Patch corrects this issue.
Reported-by: NJulian Wollrath <jwollrath@web.de>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Tested-by: NJulian Wollrath <jwollrath@web.de>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

de614e56

21 5月, 2013 2 次提交

x86: Fix bit corruption at CPU resume time · 5e427ec2

由 Linus Torvalds 提交于 5月 20, 2013

In commit 78d77df7 ("x86-64, init: Do not set NX bits on non-NX
capable hardware") we added the early_pmd_flags that gets the NX bit set
when a CPU supports NX. However, the new variable was marked __initdata,
because the main _use_ of this is in an __init routine.

However, the bit setting happens from secondary_startup_64(), which is
called not only at bootup, but on every secondary CPU start.  Including
resuming from STR and at CPU hotplug time.  So the value cannot be
__initdata.
Reported-bisected-and-tested-by: NMichal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # v3.9
Acked-by: NPeter Anvin <hpa@linux.intel.com>
Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5e427ec2

Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" · f3f01175

由 Bjorn Helgaas 提交于 5月 20, 2013

This reverts commit dd72be99.

Andy Shevchenko <andy.shevchenko@gmail.com> reported that this commit
broke Intel Medfield devices.

Reference: https://lkml.kernel.org/r/CAHp75Vdf6gFZChS47=grUygHBDWcoOWDYPzw+Zj5bdVCWj85Jw@mail.gmail.comReported-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

f3f01175

15 5月, 2013 1 次提交

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons · b4f711ee

由 John Stultz 提交于 4月 24, 2013

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.
Reported-by: NKay Sievers <kay@vrfy.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b4f711ee

10 5月, 2013 4 次提交

xen/pci: Used cached MSI-X capability offset · 7c86617d

由 Bjorn Helgaas 提交于 4月 22, 2013

We now cache the MSI-X capability offset in the struct pci_dev, so no
need to find the capability again.
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7c86617d

xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK · 4be6bfe2

由 Bjorn Helgaas 提交于 4月 22, 2013

PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4be6bfe2

x86/mm: Add missing comments for initial kernel direct mapping · cf8b166d

由 Zhang Yanfei 提交于 5月 09, 2013

Two sets of comments were lost during patch-series shuffling:

  - comments for init_range_memory_mapping()

  - comments in init_mem_mapping that is helpful for reminding people
    that the pagetable is setup top-down

The comments were written by Yinghai in his patch in:

   https://lkml.org/lkml/2012/11/28/620

This patch reintroduces them.

Originally-From: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/518BC776.7010506@gmail.com
[ Tidied it all up a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

cf8b166d

A
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE · 91c2e0bc
由 Al Viro 提交于 3月 05, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
91c2e0bc

09 5月, 2013 5 次提交

KVM: emulator: emulate SALC · 326f578f

由 Paolo Bonzini 提交于 5月 09, 2013

This is an almost-undocumented instruction available in 32-bit mode.
I say "almost" undocumented because AMD documents it in their opcode
maps just to say that it is unavailable in 64-bit mode (sections
"A.2.1 One-Byte Opcodes" and "B.3 Invalid and Reassigned Instructions
in 64-Bit Mode").

It is roughly equivalent to "sbb %al, %al" except it does not
set the flags.  Use fastop to emulate it, but do not use the opcode
directly because it would fail if the host is 64-bit!
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

326f578f

KVM: emulator: emulate XLAT · 7fa57952

由 Paolo Bonzini 提交于 5月 09, 2013

This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1.
It is just a MOV in disguise, with a funny source address.
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7fa57952

KVM: emulator: emulate AAM · a035d5c6

由 Paolo Bonzini 提交于 5月 09, 2013

This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1.

AAM needs the source operand to be unsigned; do the same in AAD as well
for consistency, even though it does not affect the result.
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a035d5c6

x86/microcode: Add local mutex to fix physical CPU hot-add deadlock · 074d72ff

由 Konrad Rzeszutek Wilk 提交于 5月 08, 2013

This can easily be triggered if a new CPU is added (via
ACPI hotplug mechanism) and from user-space you do:

   echo 1 > /sys/devices/system/cpu/cpu3/online

(or wait for UDEV to do it) on a newly appeared physical CPU.

The deadlock is that the "store_online" in drivers/base/cpu.c
takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
"cpu_up" eventually ends up calling "save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.

And here is that lockdep thinks of it:

 smpboot: Stack at about ffff880075c39f44
 smpboot: CPU3: has booted.
 microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25

 =============================================
 [ INFO: possible recursive locking detected ]
 3.9.0upstream-10129-g167af0e #1 Not tainted
 ---------------------------------------------
 sh/2487 is trying to acquire lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 but task is already holding lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(x86_cpu_hotplug_driver_mutex);
   lock(x86_cpu_hotplug_driver_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/2487:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
  #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60
Suggested-and-Acked-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: fenghua.yu@intel.com
Cc: xen-devel@lists.xensource.com
Cc: stable@vger.kernel.org # for v3.9
Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

074d72ff

KVM: VMX: fix halt emulation while emulating invalid guest sate · 8d76c49e

由 Gleb Natapov 提交于 5月 08, 2013

The invalid guest state emulation loop does not check halt_request
which causes 100% cpu loop while guest is in halt and in invalid
state, but more serious issue is that this leaves halt_request set, so
random instruction emulated by vm86 #GP exit can be interpreted
as halt which causes guest hang. Fix both problems by handling
halt_request in emulation loop.
Reported-by: NTomas Papan <tomas.papan@gmail.com>
Tested-by: NTomas Papan <tomas.papan@gmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8d76c49e

08 5月, 2013 4 次提交

xen: mask x2APIC feature in PV · 4ea9b9ac

由 Zhenzhong Duan 提交于 3月 29, 2013

On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below.

    SysRq : Show backtrace of all active CPUs
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120
    Call Trace:
     [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160
     [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20
     [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0
     [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50
     [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10
     [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140
     [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140
     [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60
     [<ffffffff811ca996>] proc_reg_write+0x86/0xc0
     [<ffffffff8116dd8e>] vfs_write+0xce/0x190
     [<ffffffff8116e3e5>] sys_write+0x55/0x90
     [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b

That's because apic points to apic_x2apic_cluster or apic_x2apic_phys
but the basic element like cpumask isn't initialized.

Mask x2APIC feature in pvm to avoid overwrite of apic pointer,
update commit message per Konrad's suggestion.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Tested-by: NTamon Shiose <tamon.shiose@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4ea9b9ac

xen/spinlock: Fix check from greater than to be also be greater or equal to. · cb91f8f4

由 Konrad Rzeszutek Wilk 提交于 5月 06, 2013

During review of git commit cb9c6f15
("xen/spinlock:  Check against default value of -1 for IRQ line.")
Stefano pointed out a bug in the patch. Unfortunatly due to vacation
timing the fix was not applied and this patch fixes it up.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cb91f8f4

xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info · d5b17dbf

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2013

As it will point to some data, but not event channel data (the
shared_info has an array limited to 32).

This means that for PVHVM guests with more than 32 VCPUs without
the usage of VCPUOP_register_info any interrupts to VCPUs
larger than 32 would have gone unnoticed during early bootup.

That is OK, as during early bootup, in smp_init we end up calling
the hotplug mechanism (xen_hvm_cpu_notify) which makes the
VCPUOP_register_vcpu_info call for all VCPUs and we can receive
interrupts on VCPUs 33 and further.

This is just a cleanup.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d5b17dbf

KVM: x86: fix maintenance of guest/host xcr0 state · 42bdf991

由 Marcelo Tosatti 提交于 4月 15, 2013

Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.

However, this is incorrect because guest_xcr0_loaded variable is
read to decide whether to reload hosts xcr0.

In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
assignment, and scheduler decides to preload FPU:

switch_to
{
  __switch_to
    __math_state_restore
      restore_fpu_checking
        fpu_restore_checking
          if (use_xsave())
              fpu_xrstor_checking
		xrstor64 with CPU's xcr0 == guests xcr0

Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.
Analyzed-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

42bdf991

07 5月, 2013 3 次提交

x86: Fix idle consolidation fallout · 97a5b81f

由 Thomas Gleixner 提交于 5月 02, 2013

The core code expects the arch idle code to return with interrupts
enabled. The conversion missed two x86 cases which fail to do that.
Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: NBorislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305021557030.3972@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

97a5b81f

x86 rwsem: avoid taking slow path when stealing write lock · a31a369b

由 Michel Lespinasse 提交于 5月 07, 2013

modify __down_write[_nested] and __down_write_trylock to grab the write
lock whenever the active count is 0, even if there are queued waiters
(they must be writers pending wakeup, since the active count is 0).

Note that this is an optimization only; architectures without this
optimization will still work fine:

- __down_write() would take the slow path which would take the wait_lock
  and then try stealing the lock (as in the spinlocked rwsem implementation)

- __down_write_trylock() would fail, but callers must be ready to deal
  with that - since there are some writers pending wakeup, they could
  have raced with us and obtained the lock before we steal it.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a31a369b

xen/vcpu: Document the xen_vcpu_info and xen_vcpu · a520996a

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2013

They are important structures and it is not clear at first
look what they are for.

The xen_vcpu is a pointer. By default it points to the shared_info
structure (at the CPU offset location). However if the
VCPUOP_register_vcpu_info hypercall is implemented we can make the
xen_vcpu pointer point to a per-CPU location.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Added comments from Ian Campbell]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a520996a

06 5月, 2013 1 次提交

xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · 7f1fc268

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2013

If a user did:

	echo 0 > /sys/devices/system/cpu/cpu1/online
	echo 1 > /sys/devices/system/cpu/cpu1/online

we would (this a build with DEBUG enabled) get to:
smpboot: ++++++++++++++++++++=_---CPU UP  1
.. snip..
smpboot: Stack at about ffff880074c0ff44
smpboot: CPU1: has booted.

and hang. The RCU mechanism would kick in an try to IPI the CPU1
but the IPIs (and all other interrupts) would never arrive at the
CPU1. At first glance at least. A bit digging in the hypervisor
trace shows that (using xenanalyze):

[vla] d4v1 vec 243 injecting
   0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
]  0.043163639 --|x d4v1 vmentry cycles 1468
]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
   0.043164913 --|x d4v1 inj_virq vec 243  real
  [vla] d4v1 vec 243 injecting
   0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
]  0.043165526 --|x d4v1 vmentry cycles 1472
]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
   0.043166800 --|x d4v1 inj_virq vec 243  real
  [vla] d4v1 vec 243 injecting

there is a pending event (subsequent debugging shows it is the IPI
from the VCPU0 when smpboot.c on VCPU1 has done
"set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
interrupted with the callback IPI (0xf3 aka 243) which ends up calling
__xen_evtchn_do_upcall.

The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
the pending events. And the moment the guest does a 'cli' (that is the
ffffffff81673254 in the log above) the hypervisor is invoked again to
inject the IPI (0xf3) to tell the guest it has pending interrupts.
This repeats itself forever.

The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
we set each per_cpu(xen_vcpu, cpu) to point to the
shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
to register per-CPU  structures (xen_vcpu_setup).
This is used to allow events for more than 32 VCPUs and for performance
optimizations reasons.

When the user performs the VCPU hotplug we end up calling the
the xen_vcpu_setup once more. We make the hypercall which returns
-EINVAL as it does not allow multiple registration calls (and
already has re-assigned where the events are being set). We pick
the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
However the hypervisor is still setting events in the register
per-cpu structure (per_cpu(xen_vcpu_info, cpu)).

As such when the events are set by the hypervisor (such as timer one),
and when we iterate in __xen_evtchn_do_upcall we end up reading stale
events from the shared_info->vcpu_info[vcpu] instead of the
per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
events that the hypervisor has set and the hypervisor keeps on reminding
us to ack the events which we never do.

The fix is simple. Don't on the second time when xen_vcpu_setup is
called over-write the per_cpu(xen_vcpu, cpu) if it points to
per_cpu(xen_vcpu_info).
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: stable@vger.kernel.org
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7f1fc268

05 5月, 2013 1 次提交

perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL · 7cc23cd6

由 Peter Zijlstra 提交于 5月 03, 2013

We should always have proper privileges when requesting kernel
data.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl
[ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org

7cc23cd6

04 5月, 2013 2 次提交

perf/x86/intel/lbr: Fix LBR filter · 6e15eb3b

由 Peter Zijlstra 提交于 5月 03, 2013

The LBR 'from' adddress is under full userspace control; ensure
we validate it before reading from it.

Note: is_module_text_address() can potentially be quite
expensive; for those running into that with high overhead
in modules optimize it using an RCU backed rb-tree.
Reported-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.158211806@chello.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-mk8i82ffzax01cnqo829iy1q@git.kernel.org

6e15eb3b

perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge · 741a698f

由 Peter Zijlstra 提交于 5月 03, 2013

Errata BV98 states that all MEM_*_RETIRED events corrupt the
counter value of the SMT sibling's counters. Blacklist these
events
Reported-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.083340271@chello.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-jwra43mujrv1oq9xk6mfe57v@git.kernel.org

741a698f

03 5月, 2013 4 次提交

KVM: x86: Account for failing enable_irq_window for NMI window request · 03b28f81

由 Jan Kiszka 提交于 4月 29, 2013

With VMX, enable_irq_window can now return -EBUSY, in which case an
immediate exit shall be requested before entering the guest. Account for
this also in enable_nmi_window which uses enable_irq_window in absence
of vnmi support, e.g.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

03b28f81

x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) · 5522ddb3

由 Alexander van Heukelum 提交于 3月 27, 2013

Commit 49cb25e9 x86: 'get rid of pt_regs argument in vm86/vm86old'
got rid of the pt_regs stub for sys_vm86old and sys_vm86. The functions
were, however, not changed to use the calling convention for syscalls.

[AV: killed asmlinkage_protect() - it's done automatically now]
Reported-and-tested-by: NHans de Bruin <jmdebruin@xmsnet.nl>
Cc: stable@vger.kernel.org
Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5522ddb3

x86-64, init: Do not set NX bits on non-NX capable hardware · 78d77df7

由 H. Peter Anvin 提交于 5月 02, 2013

During early init, we would incorrectly set the NX bit even if the NX
feature was not supported.  Instead, only set this bit if NX is
actually available and enabled.  We already do very early detection of
the NX bit to enable it in EFER, this simply extends this detection to
the early page table mask.
Reported-by: NFernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1367476850.5660.2.camel@nexus
Cc: <stable@vger.kernel.org> v3.9

78d77df7

x86, gdt, hibernate: Store/load GDT for hibernate path. · cc456c4e

由 Konrad Rzeszutek Wilk 提交于 5月 01, 2013

The git commite7a5cd06
("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path
is not needed.") assumes that for the hibernate path the booting
kernel and the resuming kernel MUST be the same. That is certainly
the case for a 32-bit kernel (see check_image_kernel and
CONFIG_ARCH_HIBERNATION_HEADER config option).

However for 64-bit kernels it is OK to have a different kernel
version (and size of the image) of the booting and resuming kernels.
Hence the above mentioned git commit introduces an regression.

This patch fixes it by introducing a 'struct desc_ptr gdt_desc'
back in the 'struct saved_context'. However instead of having in the
'save_processor_state' and 'restore_processor_state' the
store/load_gdt calls, we are only saving the GDT in the
save_processor_state.

For the restore path the lgdt operation is done in
hibernate_asm_[32|64].S in the 'restore_registers' path.

The apt reader of this description will recognize that only 64-bit
kernels need this treatment, not 32-bit. This patch adds the logic
in the 32-bit path to be more similar to 64-bit so that in the future
the unification process can take advantage of this.

[ hpa: this also reverts an inadvertent on-disk format change ]
Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

cc456c4e

01 5月, 2013 1 次提交

Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 446f24d1

由 Stephen Boyd 提交于 4月 30, 2013

The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files.  Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.

To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.

Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs.  ones which emit errors.  The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.

While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NHelge Deller <deller@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

446f24d1