提交 · 731a7378b81c2f5fa88ca1ae20b83d548d5613dc · openeuler / raspberrypi-kernel

30 5月, 2012 3 次提交

mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition · 26c19178

由 Andrea Arcangeli 提交于 5月 29, 2012

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 #4 [f06a9e6c] __do_page_fault at c0434493
 #5 [f06a9eec] do_page_fault at c083eb45
 #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 #7 [f06a9f38] _spin_lock at c083bc14
 #8 [f06a9f44] sys_mincore at c0507b7d
 #9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26c19178

x86: print physical addresses consistently with other parts of kernel · 365811d6

由 Bjorn Helgaas 提交于 5月 29, 2012

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -found SMP MP-table at [ffff8800000fce90] fce90
    +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90]
    -initial memory mapped : 0 - 20000000
    +initial memory mapped: [mem 0x00000000-0x1fffffff]
    -Base memory trampoline at [ffff88000009c000] 9c000 size 8192
    +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000]
    -SRAT: Node 0 PXM 0 0-80000000
    +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

365811d6

x86: print e820 physical addresses consistently with other parts of kernel · 91eb0f67

由 Bjorn Helgaas 提交于 5月 29, 2012

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -BIOS-provided physical RAM map:
    +e820: BIOS-provided physical RAM map:
    - BIOS-e820: 0000000000000100 - 000000000009e000 (usable)
    +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable
    -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
    +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
    -reserve RAM buffer: 000000000009e000 - 000000000009ffff
    +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91eb0f67

27 5月, 2012 3 次提交

x86: use the new generic strnlen_user() function · 5723aa99

由 Linus Torvalds 提交于 5月 26, 2012

This throws away the old x86-specific functions in favor of the generic
optimized version.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5723aa99

word-at-a-time: make the interfaces truly generic · 36126f8f

由 Linus Torvalds 提交于 5月 26, 2012

This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
complicated, but a lot more generic.

In particular, it allows us to really do the operations efficiently on
both little-endian and big-endian machines, pretty much regardless of
machine details.  For example, if you can rely on a fast population
count instruction on your architecture, this will allow you to make your
optimized <asm/word-at-a-time.h> file with that.

NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
not truly generic, it actually only works on big-endian.  Why? Because
on little-endian the generic algorithms are wasteful, since you can
inevitably do better. The x86 implementation is an example of that.

(The only truly non-generic part of the asm-generic implementation is
the "find_zero()" function, and you could make a little-endian version
of it.  And if the Kbuild infrastructure allowed us to pick a particular
header file, that would be lovely)

The <asm/word-at-a-time.h> functions are as follows:

 - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
   uses.

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.

   This is the "quick-and-dirty" zero tester: it's what is run inside
   the hot loops.

 - "prep_zero_mask()": take the word, the data that has_zero() produced,
   and the constant pool, and generate an *exact* mask of which byte had
   the first zero.  This is run directly *outside* the loop, and allows
   the "has_zero()" function to answer the "is there a zero byte"
   question without necessarily getting exactly *which* byte is the
   first one to contain a zero.

   If you do multiple byte lookups concurrently (eg "hash_name()", which
   looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
   phase, the result of those can be or'ed together to get the "either
   or" case.

 - The result from "prep_zero_mask()" can then be fed into "find_zero()"
   (to find the byte offset of the first byte that was zero) or into
   "zero_bytemask()" (to find the bytemask of the bytes preceding the
   zero byte).

   The existence of zero_bytemask() is optional, and is not necessary
   for the normal string routines.  But dentry name hashing needs it, so
   if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

This changes the generic strncpy_from_user() function and the dentry
hashing functions to use these modified word-at-a-time interfaces.  This
gets us back to the optimized state of the x86 strncpy that we lost in
the previous commit when moving over to the generic version.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36126f8f

x86: use generic strncpy_from_user routine · 4ae73f2d

由 Linus Torvalds 提交于 5月 26, 2012

The generic strncpy_from_user() is not really optimal, since it is
designed to work on both little-endian and big-endian.  And on
little-endian you can simplify much of the logic to find the first zero
byte, since little-endian arithmetic doesn't have to worry about the
carry bit propagating into earlier bytes (only later bytes, which we
don't care about).

But I have patches to make the generic routines use the architecture-
specific <asm/word-at-a-time.h> infrastructure, so that we can regain
the little-endian optimizations.  But before we do that, switch over to
the generic routines to make the patches each do just one well-defined
thing.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ae73f2d

25 5月, 2012 1 次提交

kernel: Move REPEAT_BYTE definition into linux/kernel.h · 44696908

由 David S. Miller 提交于 5月 23, 2012

And make sure that everything using it explicitly includes
that header file.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44696908

24 5月, 2012 5 次提交

x86, relocs: Add jiffies and jiffies_64 to the relative whitelist · ea17e741

由 H. Peter Anvin 提交于 5月 24, 2012

The symbol jiffies is created in the linker script as an alias to
jiffies_64. Unfortunately this is done outside any section, and
apparently GNU ld 2.21 doesn't carry the section with it, so we end up
with an absolute symbol and therefore a broken kernel.

Add jiffies and jiffies_64 to the whitelist.

The most disturbing bit with this discovery is that it shows that we
have had multiple linker bugs in this area crossing multiple
generations, and have been silently building bad kernels for some time.

Link: http://lkml.kernel.org/r/20120524171604.0d98284f3affc643e9714470@canb.auug.org.auReported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org> v3.4

ea17e741

x86/mce: Add instruction recovery signatures to mce-severity table · 37c3459b

由 Tony Luck 提交于 5月 10, 2012

Instruction recovery cases are very similar to the data recovery one
we already have. Just trade out for a new MCACOD value.
Signed-off-by: NTony Luck <tony.luck@intel.com>

37c3459b

x86/mce: Fix check for processor context when machine check was taken. · 875e2664

由 Tony Luck 提交于 5月 23, 2012

Linus pointed out that there was no value is checking whether m->ip
was zero - because zero is a legimate value.  If we have a reliable
(or faked in the VM86 case) "m->cs" we can use it to tell whether we
were in user mode or kernelwhen the machine check hit.
Reported-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTony Luck <tony.luck@intel.com>

875e2664

MCE: Fix vm86 handling for 32bit mce handler · a129a7c8

由 Andi Kleen 提交于 11月 19, 2010

When running on 32bit the mce handler could misinterpret
vm86 mode as ring 0. This can affect whether it does recovery
or not; it was possible to panic when recovery was actually
possible.

Fix this by always forcing vm86 to look like ring 3.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTony Luck <tony.luck@intel.com>

a129a7c8

x86-32, relocs: Whitelist more symbols for ld bug workaround · fd952815

由 H. Peter Anvin 提交于 5月 23, 2012

As noted in checkin:

a3e854d9 x86, relocs: Workaround for binutils 2.22.52.0.1 section bug

ld version 2.22.52.0.[12] can incorrectly promote relative symbols to
absolute, if the output section they appear in is otherwise empty.

Since checkin:

6520fe55 x86, realmode: 16-bit real-mode code support for relocs tool

we actually check for this and error out rather than silently creating
a kernel which will malfunction if relocated.

Ingo found a configuration in which __start_builtin_fw triggered the
warning.

Go through the linker script sources and look for more symbols that
could plausibly get bogusly promoted to absolute, and add them to the
whitelist.

In general, if the following error triggers:

	Invalid absolute R_386_32 relocation: <symbol>

... then we should verify that <symbol> is really meant to be
relocated, and add it and any related symbols manually to the S_REL
regexp.

Please note that 6520fe55 does not introduce the error, only the check
for the error -- without 6520fe55 this version of ld will simply
produce a corrupt kernel if CONFIG_RELOCATABLE is set on x86-32.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org> v3.4

fd952815

22 5月, 2012 11 次提交

raid5: add AVX optimized RAID5 checksumming · ea4d26ae

由 Jim Kukunas 提交于 5月 22, 2012

Optimize RAID5 xor checksumming by taking advantage of
256-bit YMM registers introduced in AVX.
Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ea4d26ae

new helper: sigsuspend() · 68f3f16d

由 Al Viro 提交于 5月 21, 2012

guts of saved_sigmask-based sigsuspend/rt_sigsuspend.  Takes
kernel sigset_t *.

Open-coded instances replaced with calling it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

68f3f16d

timers: Fixup the Kconfig consolidation fallout · 764e0da1

由 Thomas Gleixner 提交于 5月 21, 2012

Sigh, I missed to check which architecture Kconfig files actually
include the core Kconfig file. There are a few which did not. So we
broke them.

Instead of adding the includes to those, we are better off to move the
include to init/Kconfig like we did already with irqs and others.

This does not change anything for the architectures using the old
style periodic timer mode. It just solves the build wreckage there.

For those architectures which use the clock events infrastructure it
moves the include of the core Kconfig file to "General setup" which is
a way more logical place than having it at random locations specified
by the architecture specific Kconfigs.
Reported-by: NIngo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@glx-um.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

764e0da1

A
um: missing checks of __put_user()/__get_user() return values · ffc51be8
由 Al Viro 提交于 4月 22, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ffc51be8
A
um: stub_rt_sigsuspend isn't needed these days anymore · 0088b6ec
由 Al Viro 提交于 4月 22, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0088b6ec
A
um/x86: merge (and trim) 32- and 64-bit variants of ptrace.h · 243412be
由 Al Viro 提交于 5月 20, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
243412be

net: drop NET dependency from HAVE_BPF_JIT · e47b65b0

由 Sam Ravnborg 提交于 5月 21, 2012

There is no point having the NET dependency on the select target, as it
forces all users to depend on NET to tell they support BPF_JIT.  Move
the config option to the bottom of the file - this could be a nice place
also for future "selectable" config symbols.

Fix up all users to drop the dependency on NET now that it is not
required to supress warnings for non-NET builds.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e47b65b0

x86, relocs: Build clean fix · b2d668da

由 Jarkko Sakkinen 提交于 5月 21, 2012

relocs was not cleaned up when "make clean" is issued. This
patch fixes the issue.
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1337622684-6834-1-git-send-email-jarkko.sakkinen@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org> v3.4

b2d668da

A
um: ->restart_block.fn needs to be reset on sigreturn · 3b7d15bd
由 Al Viro 提交于 4月 22, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3b7d15bd

xen: do not map the same GSI twice in PVHVM guests. · 68c2c39a

由 Stefano Stabellini 提交于 5月 21, 2012

PV on HVM guests map GSIs into event channels. At restore time the
event channels are resumed by restore_pirqs.

Device drivers might try to register the same GSI again through ACPI at
restore time, but the GSI has already been mapped and bound by
restore_pirqs. This patch detects these situations and avoids
 mapping the same GSI multiple times.

Without this patch we get:
(XEN) irq.c:2235: dom4: pirq 23 or emuirq 28 already mapped
and waste a pirq.

CC: stable@kernel.org
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

68c2c39a

x86, printk: Add missing KERN_CONT to NMI selftest · 29d679ff

由 Sasha Levin 提交于 5月 08, 2012

Fix this behaviour:

----------------
| NMI testsuite:
--------------------
  remote IPI:
  ok  |

   local IPI:
  ok  |

Revealed due to a new modification to printk().
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Link: http://lkml.kernel.org/r/1336492573-17530-3-git-send-email-levinsasha928@gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

29d679ff

21 5月, 2012 5 次提交

xen/smp: unbind irqworkX when unplugging vCPUs. · 2f1bd67d

由 Konrad Rzeszutek Wilk 提交于 5月 21, 2012

The git commit  1ff2b0c3
"xen: implement IRQ_WORK_VECTOR handler" added the functionality
to have a per-cpu "irqworkX" for the IPI APIC functionality.
However it missed the unbind when a vCPU is unplugged resulting
in an orphaned per-cpu interrupt line for unplugged vCPU:

  30:        216          0   xen-dyn-event     hvc_console
  31:        810          4   xen-dyn-event     eth0
  32:         29          0   xen-dyn-event     blkif
- 36:          0          0  xen-percpu-ipi       irqwork2
- 37:        287          0   xen-dyn-event     xenbus
+ 36:        287          0   xen-dyn-event     xenbus
 NMI:          0          0   Non-maskable interrupts
 LOC:          0          0   Local timer interrupts
 SPU:          0          0   Spurious interrupts
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2f1bd67d

X86: integrate CMA with DMA-mapping subsystem · 0a2b9a6e

由 Marek Szyprowski 提交于 12月 29, 2011

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

0a2b9a6e

x86: Use generic time config · bdebaf80

由 Thomas Gleixner 提交于 5月 18, 2012

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAnna-Maria Gleixner <anna-maria@glx-um.de>
Link: http://lkml.kernel.org/r/20120518163104.630579708@glx-um.de
Cc: x86@kernel.org

bdebaf80

x86/pci-calgary_64.c: Remove obsoleted simple_strtoul() usage · 74bc4917

由 Shuah Khan 提交于 5月 20, 2012

Change calgary_parse_options() to call kstrtoul() instead of
calling obsoleted simple_strtoul().
Signed-off-by: NShuah Khan <shuahkhan@gmail.com>
Acked-by: NMuli Ben-Yehuda <muli@cs.technion.ac.il>
Cc: jdmason@kudzu.us
Link: http://lkml.kernel.org/r/1337556268.3126.5.camel@lorien2Signed-off-by: NIngo Molnar <mingo@kernel.org>

74bc4917

x86, realmode: Move end signature into header.S · 61f54461

由 H. Peter Anvin 提交于 5月 21, 2012

The end signature was defined in wakeup_asm.S as it originally came
from the ACPI wakeup code.  However, we rely on the existence of the
.signature section to expand .bss, otherwise we would have to include
code to explicitly zero the .bss depending on the configuration.
Since the expanded .bss is just in .init.data anyway, it's easier to
always have it expanded.

This fixes failures when compiled without CONFIG_ACPI_SLEEP.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

61f54461

19 5月, 2012 4 次提交

x86, relocs: When printing an error, say relative or absolute · 24ab82bd

由 H. Peter Anvin 提交于 5月 18, 2012

When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol.  This should make it a lot more
clear what action the programmer needs to take and should help us find
the reason if additional symbol bugs show up.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>

24ab82bd

x86, relocs: Workaround for binutils 2.22.52.0.1 section bug · a3e854d9

由 H. Peter Anvin 提交于 5月 18, 2012

GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols.  Let the relocs program know that those should be treated as
relative symbols.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

a3e854d9

x86, realmode: 16-bit real-mode code support for relocs tool · 6520fe55

由 H. Peter Anvin 提交于 5月 08, 2012

A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.

In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.

16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.

The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture.  be compiled before building the arch/x86 tree.

[ hpa: accelerating this because it detects invalid absolute
  relocations, a serious bug in binutils 2.22.52.0.x which currently
  produces bad kernels. ]
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.comSigned-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

6520fe55

x86, relocs: When printing an error, say relative or absolute · 8a3b947c

由 H. Peter Anvin 提交于 5月 18, 2012

When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol.  This should make it a lot more
clear what action the programmer needs to take.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

8a3b947c

18 5月, 2012 8 次提交

x86, relocs: More relocations which may end up as absolute · c54a354c

由 H. Peter Anvin 提交于 5月 18, 2012

GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols.  Let the relocs program know that those should be treated as
relative symbols.

This bug is exposed by checkin

433de739 x86, realmode: 16-bit real-mode code support for relocs tool

only in the sense that that checkin changes the relocs tool to report
an error instead of silently generating a kernel which is broken if
relocated.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

c54a354c

perf/x86: Update event scheduling constraints for AMD family 15h models · 5bcdf5e4

由 Robert Richter 提交于 5月 18, 2012

This update is for newer family 15h cpu models from 0x02 to 0x1f.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org # v2.6.39+
Link: http://lkml.kernel.org/r/1337337642-1621-1-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5bcdf5e4

x86-64: Fix accounting in kernel_physical_mapping_init() · 20167d34

由 Jan Beulich 提交于 5月 16, 2012

When finding a present and acceptable 2M/1G mapping, the number
of pages mapped this way shouldn't be incremented (as it was
already incremented when the earlier part of the mapping was
established). Instead, last_map_addr needs to be updated in this
case.

Further, address increments were wrong in one place each in both
phys_pmd_init() and phys_pud_init() (lacking the aligning down
to the respective page boundary).

As we're now doing the same calculation several times, fold it
into a single instance using a local variable (matching how
kernel_physical_mapping_init() itself does it at the PGD level).

Observed during code inspection, not because of an actual
problem.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FB3C27202000078000841A0@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

20167d34

x86/tlb: Clean up and unify TLB_FLUSH_ALL definition · 3e7f3db0

由 Alex Shi 提交于 5月 10, 2012

Since sizeof(long) is 4 in x86_32 mode, and it's 8 in x86_64
mode, sizeof(long long) is also 8 byte in x86_64 mode.
use long mode can fit TLB_FLUSH_ALL defination here both in 32
or 64 bits mode.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/n/tip-evv5bekiipi2pmyzdsy8lkkw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

3e7f3db0

x86/apic: Implement EIO micro-optimization · 0ab711ae

由 Michael S. Tsirkin 提交于 5月 16, 2012

We know both register and value for eoi beforehand,
so there's no need to check it and no need to do math
to calculate the msr. Saves instructions/branches
on each EOI when using x2apic.

I looked at the objdump output to verify that the
generated code looks right and actually is shorter.

The real improvemements will be on the KVM guest side
though, those come in a later patch.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/e019d1a125316f10d3e3a4b2f6bda41473f4fb72.1337184153.git.mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0ab711ae

x86/apic: Add apic->eoi_write() callback · 2a43195d

由 Michael S. Tsirkin 提交于 5月 16, 2012

Add eoi_write callback so that kvm can override
eoi accesses without touching the rest of the apic.
As a side-effect, this will enable a micro-optimization
for apics using msr.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0df425d746c49ac2ecc405174df87752869629d2.1337184153.git.mst@redhat.com
[ tidied it up a bit ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2a43195d

x86/apic: Use symbolic APIC_EOI_ACK · 4ebcc243

由 Michael S. Tsirkin 提交于 5月 16, 2012

Use the symbol instead of hard-coded numbers,
now that the reason for the value is documented
where the constant is defined we don't need to
duplicate this explanation in code.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/ecbe4c79d69c172378e47e5a587ff5cd10293c9f.1337184153.git.mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

4ebcc243

x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it · c8f64bf7

由 Michael S. Tsirkin 提交于 5月 16, 2012

Fix typo in the macro name and document the
reason it has this value. Update users.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/37867b31b9330690af2e60a2a7c4cb4b1b070caf.1337184153.git.mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c8f64bf7