提交 · 30d6e0a4190d37740e9447e4e4815f06992dd8c3 · openeuler / Kernel

26 8月, 2017 1 次提交

futex: Remove duplicated code and fix undefined behaviour · 30d6e0a4

由 Jiri Slaby 提交于 8月 24, 2017

There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a2970 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: NDarren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz

30d6e0a4

24 8月, 2017 1 次提交

ARM: dts: exynos: add needs-hpd for Odroid-XU3/4 · 93a4c835

由 Hans Verkuil 提交于 8月 23, 2017

CEC support was added for Exynos5 in 4.13, but for the Odroids we need to set
'needs-hpd' as well since CEC is disabled when there is no HDMI hotplug signal,
just as for the exynos4 Odroid-U3.

This is due to the level-shifter that is disabled when there is no HPD, thus
blocking the CEC signal as well. Same close-but-no-cigar board design as the
Odroid-U3.

Tested with my Odroid XU4.
Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

93a4c835

23 8月, 2017 5 次提交

ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms · dbeb0c8e

由 Arnd Bergmann 提交于 8月 23, 2017

My previous patch fixed a link error for all at91 platforms when
CONFIG_ARM_CPU_SUSPEND was not set, however this caused another
problem on a configuration that enabled CONFIG_ARCH_AT91 but none
of the individual SoCs, and that also enabled CPU_ARM720 as
the only CPU:

warning: (ARCH_AT91 && SOC_IMX23 && SOC_IMX28 && ARCH_PXA && MACH_MVEBU_V7 && SOC_IMX6 && ARCH_OMAP3 && ARCH_OMAP4 && SOC_OMAP5 && SOC_AM33XX && SOC_DRA7XX && ARCH_EXYNOS3 && ARCH_EXYNOS4 && EXYNOS5420_MCPM && EXYNOS_CPU_SUSPEND && ARCH_VEXPRESS_TC2_PM && ARM_BIG_LITTLE_CPUIDLE && ARM_HIGHBANK_CPUIDLE && QCOM_PM) selects ARM_CPU_SUSPEND which has unmet direct dependencies (ARCH_SUSPEND_POSSIBLE)
arch/arm/kernel/sleep.o: In function `cpu_resume':
(.text+0xf0): undefined reference to `cpu_arm720_suspend_size'
arch/arm/kernel/suspend.o: In function `__cpu_suspend_save':
suspend.c:(.text+0x134): undefined reference to `cpu_arm720_do_suspend'

This improves the hack some more by only selecting ARM_CPU_SUSPEND
for the part that requires it, and changing pm.c to drop the
contents of unused init functions so we no longer refer to
cpu_resume on at91 platforms that don't need it.

Fixes: cc7a938f ("ARM: at91: select CONFIG_ARM_CPU_SUSPEND")
Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

dbeb0c8e

arm64: kaslr: Adjust the offset to avoid Image across alignment boundary · a067d94d

由 Catalin Marinas 提交于 8月 22, 2017

With 16KB pages and a kernel Image larger than 16MB, the current
kaslr_early_init() logic for avoiding mappings across swapper table
boundaries fails since increasing the offset by kimg_sz just moves the
problem to the next boundary.

This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the
Image crosses a PMD_SIZE boundary.

Fixes: afd0e5a8 ("arm64: kaslr: Fix up the kernel image alignment")
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a067d94d

arm64: kaslr: ignore modulo offset when validating virtual displacement · 4a23e56a

由 Ard Biesheuvel 提交于 8月 18, 2017

In the KASLR setup routine, we ensure that the early virtual mapping
of the kernel image does not cover more than a single table entry at
the level above the swapper block level, so that the assembler routines
involved in setting up this mapping can remain simple.

In this calculation we add the proposed KASLR offset to the values of
the _text and _end markers, and reject it if they would end up falling
in different swapper table sized windows.

However, when taking the addresses of _text and _end, the modulo offset
(the physical displacement modulo 2 MB) is already accounted for, and
so adding it again results in incorrect results. So disregard the modulo
offset from the calculation.

Fixes: 08cdac61 ("arm64: relocatable: deal with physically misaligned ...")
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

4a23e56a

arm64: mm: abort uaccess retries upon fatal signal · 289d07a2

由 Mark Rutland 提交于 7月 11, 2017

When there's a fatal signal pending, arm64's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.

However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.

To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: NSteve Capper <steve.capper@arm.com>
Tested-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Tested-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

289d07a2

arm64: fpsimd: Prevent registers leaking across exec · 09662210

由 Dave Martin 提交于 8月 18, 2017

There are some tricky dependencies between the different stages of
flushing the FPSIMD register state during exec, and these can race
with context switch in ways that can cause the old task's regs to
leak across.  In particular, a context switch during the memset() can
cause some of the task's old FPSIMD registers to reappear.

Disabling preemption for this small window would be no big deal for
performance: preemption is already disabled for similar scenarios
like updating the FPSIMD registers in sigreturn.

So, instead of rearranging things in ways that might swap existing
subtle bugs for new ones, this patch just disables preemption
around the FPSIMD state flushing so that races of this type can't
occur here.  This brings fpsimd_flush_thread() into line with other
code paths.

Cc: stable@vger.kernel.org
Fixes: 674c242c ("arm64: flush FP/SIMD state correctly after execve()")
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

09662210

22 8月, 2017 1 次提交

sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() · 2dc77533

由 Thomas Petazzoni 提交于 8月 13, 2017

When building the kernel for Sparc using gcc 7.x, the build fails
with:

arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    cmd |= PCI_COMMAND_IO;
        ^~

The simplified code looks like this:

unsigned int cmd;
[...]
pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
[...]
cmd |= PCI_COMMAND_IO;

I.e, the code assumes that pcic_read_config() will always initialize
cmd. But it's not the case. Looking at pcic_read_config(), if
bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
not be initialized.

As a simple fix, we initialize cmd to zero at the beginning of
pcibios_fixup_bus.
Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2dc77533

19 8月, 2017 2 次提交

mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes · c715b72c

由 Kees Cook 提交于 8月 18, 2017

Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
broke AddressSanitizer.  This is a partial revert of:

  eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
  02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")

The AddressSanitizer tool has hard-coded expectations about where
executable mappings are loaded.

The motivation for changing the PIE base in the above commits was to
avoid the Stack-Clash CVEs that allowed executable mappings to get too
close to heap and stack.  This was mainly a problem on 32-bit, but the
64-bit bases were moved too, in an effort to proactively protect those
systems (proofs of concept do exist that show 64-bit collisions, but
other recent changes to fix stack accounting and setuid behaviors will
minimize the impact).

The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
base), so only the 64-bit PIE base needs to be reverted to let x86 and
arm64 ASan binaries run again.  Future changes to the 64-bit PIE base on
these architectures can be made optional once a more dynamic method for
dealing with AddressSanitizer is found.  (e.g.  always loading PIE into
the mmap region for marked binaries.)

Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
Fixes: eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Fixes: 02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
Signed-off-by: NKees Cook <keescook@chromium.org>
Reported-by: NKostya Serebryany <kcc@google.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c715b72c

kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog · 92e5aae4

由 Nicholas Piggin 提交于 8月 18, 2017

Commit 05a4a952 ("kernel/watchdog: split up config options") lost
the perf-based hardlockup detector's dependency on PERF_EVENTS, which
can result in broken builds with some powerpc configurations.

Restore the dependency.  Add it in for x86 too, despite x86 always
selecting PERF_EVENTS it seems reasonable to make the dependency
explicit.

Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
Fixes: 05a4a952 ("kernel/watchdog: split up config options")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92e5aae4

18 8月, 2017 3 次提交

kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68

由 Thomas Gleixner 提交于 8月 15, 2017

The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.

The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.

A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.

Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.

That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.

Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: NKan Liang <Kan.liang@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos

7edaeb68

x86: Constify attribute_group structures · 45bd07ad

由 Arvind Yadav 提交于 7月 20, 2017

attribute_groups are not supposed to change at runtime and none of the
groups is modified.

Mark the non-const structs as const.

[ tglx: Folded into one big patch ]
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: tony.luck@intel.com
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com

45bd07ad

ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset · c40bc54f

由 Gary Bisson 提交于 8月 17, 2017

Previous value was a bad copy of nitrogen6_max device tree.
Signed-off-by: NGary Bisson <gary.bisson@boundarydevices.com>
Fixes: 3faa1bb2 ("ARM: dts: imx: add Boundary Devices Nitrogen6_SOM2 support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

c40bc54f

17 8月, 2017 6 次提交

locking/refcounts, x86/asm: Implement fast refcount overflow protection · 7a46ec0e

由 Kees Cook 提交于 8月 15, 2017

This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.

This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:

  http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

  http://cyseclabs.com/page?n=02012016

This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc() before:

  .text:
  ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)

refcount_inc() after:

  .text:
  ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
  ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
  ...
  .text.unlikely:
  ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
  ffffffff816c36d7:       0f ff                   (bad)

These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

  2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
		    cycles		protections
  atomic_t           82249267387	none
  refcount_t-fast    82211446892	overflow, untested dec-to-zero
  refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero

This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>

7a46ec0e

x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' · 187e91fe

由 Alexander Potapenko 提交于 8月 16, 2017

__startup_64() is normally using fixup_pointer() to access globals in a
position-independent fashion. However 'next_early_pgt' was accessed
directly, which wasn't guaranteed to work.

Luckily GCC was generating a R_X86_64_PC32 PC-relative relocation for
'next_early_pgt', but Clang emitted a R_X86_64_32S, which led to
accessing invalid memory and rebooting the kernel.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Davidson <md@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c88d7150 ("x86/boot/64: Rewrite startup_64() in C")
Link: http://lkml.kernel.org/r/20170816190808.131748-1-glider@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

187e91fe

x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks · 01578e36

由 Oleg Nesterov 提交于 8月 15, 2017

The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and
randomize_stack_top() are not required.

PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not
set, no need to re-check after that.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com

01578e36

x86: Fix norandmaps/ADDR_NO_RANDOMIZE · 47ac5484

由 Oleg Nesterov 提交于 8月 15, 2017

Documentation/admin-guide/kernel-parameters.txt says:

    norandmaps  Don't use address space randomization. Equivalent
                to echo 0 > /proc/sys/kernel/randomize_va_space

but it doesn't work because arch_rnd() which is used to randomize
mm->mmap_base returns a random value unconditionally. And as Kirill
pointed out, ADDR_NO_RANDOMIZE is broken by the same reason.

Just shift the PF_RANDOMIZE check from arch_mmap_rnd() to arch_rnd().

Fixes: 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170815153952.GA1076@redhat.com

47ac5484

sparc64: remove unnecessary log message · 6170a506

由 Tushar Dave 提交于 8月 16, 2017

There is no need to log message if ATU hvapi couldn't get register.
Unlike PCI hvapi, ATU hvapi registration failure is not hard error.
Even if ATU hvapi registration fails (on system with ATU or without
ATU) system continues with legacy IOMMU. So only log message when
ATU hvapi successfully get registered.
Signed-off-by: NTushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6170a506

sparc64: Don't clibber fixed registers in __multi4. · 79db7958

由 David S. Miller 提交于 8月 15, 2017

%g4 and %g5 are fixed registers used by the kernel for the thread
pointer and the per-cpu offset.  Use %o4 and %g7 instead.

Diagnosis by Anthony Yznaga.

Fixes: 1b4af13f ("sparc64: Add __multi3 for gcc 7.x and later.")
Reported-by: NAnatoly Pugachev <matorola@gmail.com>
Tested-by: NAnatoly Pugachev <matorola@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79db7958

16 8月, 2017 1 次提交

powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC · 5a69aec9

由 Benjamin Herrenschmidt 提交于 8月 16, 2017

VSX uses a combination of the old vector registers, the old FP
registers and new "second halves" of the FP registers.

Thus when we need to see the VSX state in the thread struct
(flush_vsx_to_thread()) or when we'll use the VSX in the kernel
(enable_kernel_vsx()) we need to ensure they are all flushed into
the thread struct if either of them is individually enabled.

Unfortunately we only tested if the whole VSX was enabled, not if they
were individually enabled.

Fixes: 72cd7b44 ("powerpc: Uncomment and make enable_kernel_vsx() routine available")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5a69aec9

15 8月, 2017 1 次提交

x86/mtrr: Prevent CPU hotplug lock recursion · 84393817

由 Thomas Gleixner 提交于 8月 15, 2017

Larry reported a CPU hotplug lock recursion in the MTRR code.

============================================
WARNING: possible recursive locking detected

systemd-udevd/153 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30
 
 but task is already holding lock:
  (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470

....

 cpus_read_lock+0x48/0x90
 stop_machine+0x16/0x30
 mtrr_add_page+0x18b/0x470
 mtrr_add+0x3e/0x70

mtrr_add_page() holds the hotplug rwsem already and calls stop_machine()
which acquires it again.

Call stop_machine_cpuslocked() instead.
Reported-and-tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>

84393817

14 8月, 2017 1 次提交

arm64: allwinner: h5: fix pinctrl IRQs · d86e63e1

由 Icenowy Zheng 提交于 8月 11, 2017

The pin controller of H5 has three IRQs at the chip's GIC, which
represents three banks of pinctrl IRQs. However, the device tree used to
miss the third IRQ of the pin controller, which makes the PG bank IRQ
not usable.

Add the missing IRQ to the pinctrl node.

Fixes: 4e36de17 ("arm64: allwinner: h5: add Allwinner H5 .dtsi")
Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
Signed-off-by: NChen-Yu Tsai <wens@csie.org>

d86e63e1

11 8月, 2017 11 次提交

clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled · adb4f11e

由 Ding Tianhong 提交于 8月 10, 2017

On platforms with an arch timer erratum workaround, it's possible for
arch_timer_reg_read_stable() to recurse into itself when certain
tracing options are enabled, leading to stack overflows and related
problems.

For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
selected, it's possible to trigger this with:

$ mount -t debugfs nodev /sys/kernel/debug/
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer

The problem is that in such cases, preempt_disable() instrumentation
attempts to acquire a timestamp via trace_clock(), resulting in a call
back to arch_timer_reg_read_stable(), and hence recursion.

This patch changes arch_timer_reg_read_stable() to use
preempt_{disable,enable}_notrace(), which avoids this.

This problem is similar to the fixed by upstream commit 96b3d28b
("sched/clock: Prevent tracing recursion in sched_clock_cpu()").

Fixes: 6acc71cc ("arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs")
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

adb4f11e

xen: fix hvm guest with kaslr enabled · 4ca83dcf

由 Juergen Gross 提交于 7月 28, 2017

A Xen HVM guest running with KASLR enabled will die rather soon today
because the shared info page mapping is using va() too early. This was
introduced by commit a5d5f328 ("xen:
allocate page for shared info page from low memory").

In order to fix this use early_memremap() to get a temporary virtual
address for shared info until va() can be used safely.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>

4ca83dcf

xen: split up xen_hvm_init_shared_info() · 10231f69

由 Juergen Gross 提交于 7月 28, 2017

Instead of calling xen_hvm_init_shared_info() on boot and resume split
it up into a boot time function searching for the pfn to use and a
mapping function doing the hypervisor mapping call.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>

10231f69

x86: provide an init_mem_mapping hypervisor hook · c138d811

由 Juergen Gross 提交于 7月 28, 2017

Provide a hook in hypervisor_x86 called after setting up initial
memory mapping.

This is needed e.g. by Xen HVM guests to map the hypervisor shared
info page.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>

c138d811

x86: Mark various structures and functions as 'static' · b45e4c45

由 Colin Ian King 提交于 8月 10, 2017

Mark a couple of structures and functions as 'static', pointed out by Sparse:

warning: symbol 'bts_pmu' was not declared. Should it be static?
warning: symbol 'p4_event_aliases' was not declared. Should it be static?
warning: symbol 'rapl_attr_groups' was not declared. Should it be static?
symbol 'process_uv2_message' was not declared. Should it be static?
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: Andrew Banman <abanman@hpe.com> # for the UV change
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20170810155709.7094-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b45e4c45

x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag · 5442c269

由 Borislav Petkov 提交于 8月 01, 2017

"virtual_vmload_vmsave" is what is going to land in /proc/cpuinfo now
as per v4.13-rc4, for a single feature bit which is clearly too long.

So rename it to what it is called in the processor manual.
"v_vmsave_vmload" is a bit shorter, after all.

We could go more aggressively here but having it the same as in the
processor manual is advantageous.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm-ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170801185552.GA3743@nazgul.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

5442c269

arc: Mask individual IRQ lines during core INTC init · a8ec3ee8

由 Alexey Brodkin 提交于 8月 10, 2017

ARC cores on reset have all interrupt lines of built-in INTC enabled.
Which means once we globally enable interrupts (very early on boot)
faulty hardware blocks may trigger an interrupt that Linux kernel
cannot handle yet as corresponding handler is not yet installed.

In that case system falls in "interrupt storm" and basically never
does anything useful except entering and exiting generic IRQ handling
code.

One real example of that kind of problematic hardware is DW GMAC which
also has interrupts enabled on reset and if Ethernet PHY informs GMAC
about link state, GMAC immediately reports that upstream to ARC core
and here we are.

Now with that change we mask all individual IRQ lines making entire
system more fool-proof.

[This patch was motivated by Adaptrum platform support]
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Cc: Eugeniy Paltsev <paltsev@synopsys.com>
Tested-by: NAlexandru Gagniuc <alex.g@adaptrum.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

a8ec3ee8

cpufreq: x86: Disable interrupts during MSRs reading · 8e2f3bce

由 Doug Smythies 提交于 8月 08, 2017

According to Intel 64 and IA-32 Architectures SDM, Volume 3,
Chapter 14.2, "Software needs to exercise care to avoid delays
between the two RDMSRs (for example interrupts)".

So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.

See also: commit 4ab60c3f (cpufreq: intel_pstate: Disable
interrupts during MSRs reading).
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Reviewed-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8e2f3bce

mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21

由 Minchan Kim 提交于 8月 10, 2017

Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman:
 "The race in question is CPU 0 running madv_free and updating some PTEs
  while CPU 1 is also running madv_free and looking at the same PTEs.
  CPU 1 may have writable TLB entries for a page but fail the pte_dirty
  check (because CPU 0 has updated it already) and potentially fail to
  flush.

  Hence, when madv_free on CPU 1 returns, there are still potentially
  writable TLB entries and the underlying PTE is still present so that a
  subsequent write does not necessarily propagate the dirty bit to the
  underlying PTE any more. Reclaim at some unknown time at the future
  may then see that the PTE is still clean and discard the page even
  though a write has happened in the meantime. I think this is possible
  but I could have missed some protection in madv_free that prevents it
  happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on.  In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.

I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.

NOTE:

This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um).  It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends.  However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
  Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NNadav Amit <namit@vmware.com>
Reported-by: NNadav Amit <namit@vmware.com>
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99baac21

mm: refactor TLB gathering API · 56236a59

由 Minchan Kim 提交于 8月 10, 2017

This patch is a preparatory patch for solving race problems caused by
TLB batch.  For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.

Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.

It shouldn't change any behavior.

Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NNadav Amit <namit@vmware.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

56236a59

mm: add pmd_t initializer __pmd() to work around a GCC bug. · 9157259d

由 Zi Yan 提交于 8月 03, 2017

THP migration is added but only supports x86_64 at the moment. For all
other architectures, swp_entry_to_pmd() only returns a zero pmd_t.

Due to a GCC zero initializer bug #53119, the standard (pmd_t){0}
initializer is not accepted by all GCC versions. __pmd() is a feasible
workaround. In addition, sparc32's pmd_t is an array instead of a single
value, so we need (pmd_t){ {0}, } instead of (pmd_t){0}. Thus,
a different __pmd() definition is needed in sparc32.
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9157259d

10 8月, 2017 7 次提交

x86/smpboot: Unbreak CPU0 hotplug · 10e66760

由 Vitaly Kuznetsov 提交于 8月 03, 2017

A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().

So it's a classic ABBA deadlock, due to the use of synchronize_sched() in
unregister_nmi_handler().

Fix the bug by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170803105818.9934-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

10e66760

x86/asm/64: Clear AC on NMI entries · e93c1730

由 Andy Lutomirski 提交于 8月 07, 2017

This closes a hole in our SMAP implementation.

This patch comes from grsecurity. Good catch!
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/314cc9f294e8f14ed85485727556ad4f15bb1659.1502159503.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e93c1730

locking: Remove smp_mb__before_spinlock() · a9668cd6

由 Peter Zijlstra 提交于 6月 07, 2017

Now that there are no users of smp_mb__before_spinlock() left, remove
it entirely.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a9668cd6

locking: Introduce smp_mb__after_spinlock() · d89e588c

由 Peter Zijlstra 提交于 9月 05, 2016

Since its inception, our understanding of ACQUIRE, esp. as applied to
spinlocks, has changed somewhat. Also, I wonder if, with a simple
change, we cannot make it provide more.

The problem with the comment is that the STORE done by spin_lock isn't
itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
it and cross with any prior STORE, rendering the default WMB
insufficient (pointed out by Alan).

Now, this is only really a problem on PowerPC and ARM64, both of
which already defined smp_mb__before_spinlock() as a smp_mb().

At the same time, we can get a much stronger construct if we place
that same barrier _inside_ the spin_lock(). In that case we upgrade
the RCpc spinlock to an RCsc.  That would make all schedule() calls
fully transitive against one another.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

d89e588c

locking/atomic: Fix atomic_set_release() for 'funny' architectures · 9d664c0a

由 Peter Zijlstra 提交于 6月 09, 2017

Those architectures that have a special atomic_set implementation also
need a special atomic_set_release(), because for the very same reason
WRITE_ONCE() is broken for them, smp_store_release() is too.

The vast majority is architectures that have spinlock hash based atomic
implementation except hexagon which seems to have a hardware 'feature'.

The spinlock based atomics should be SC, that is, none of them appear to
place extra barriers in atomic_cmpxchg() or any of the other SC atomic
primitives and therefore seem to rely on their spinlock implementation
being SC (I did not fully validate all that).

Therefore, the normal atomic_set() is SC and can be used at
atomic_set_release().
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: davem@davemloft.net
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: rkuo@codeaurora.org
Cc: vgupta@synopsys.com
Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

9d664c0a

perf/x86: Fix RDPMC vs. mm_struct tracking · bfe33492

由 Peter Zijlstra 提交于 8月 02, 2017

Vince reported the following rdpmc() testcase failure:

 > Failing test case:
 >
 >	fd=perf_event_open();
 >	addr=mmap(fd);
 >	exec()  // without closing or unmapping the event
 >	fd=perf_event_open();
 >	addr=mmap(fd);
 >	rdpmc()	// GPFs due to rdpmc being disabled

The problem is of course that exec() plays tricks with what is
current->mm, only destroying the old mappings after having
installed the new mm.

Fix this confusion by passing along vma->vm_mm instead of relying on
current->mm.
Reported-by: NVince Weaver <vincent.weaver@maine.edu>
Tested-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e0fb9ec ("perf: Add pmu callbacks to track event mapping and unmapping")
Link: http://lkml.kernel.org/r/20170802173930.cstykcqefmqt7jau@hirez.programming.kicks-ass.net
[ Minor cleanups. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

bfe33492

arm64: allwinner: a64: sopine: add missing ethernet0 alias · 56a91550

由 Icenowy Zheng 提交于 7月 22, 2017

The EMAC Ethernet controller was enabled, but an accompanying alias
was not added. This results in unstable numbering if other Ethernet
devices, such as a USB dongle, are present. Also, the bootloader uses
the alias to assign a generated stable MAC address to the device node.
Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Fixes: 96219b00 ("arm64: allwinner: a64: add device tree for SoPine
		      with baseboard")
[wens@csie.org: Rewrite commit log as fixing a previous patch with Fixes]
Signed-off-by: NChen-Yu Tsai <wens@csie.org>

56a91550

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功