提交 · b9dd21e104bcd45e124acfe978a79df71259e59b · openeuler / Kernel

25 8月, 2017 2 次提交

KVM: x86: simplify handling of PKRU · b9dd21e1

由 Paolo Bonzini 提交于 8月 23, 2017

Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
simple comparison against the host value of the register.  The write of
PKRU in addition can be skipped if the guest has not enabled the feature.
Once we do this, we need not test OSPKE in the host anymore, because
guest_CR4.PKE=1 implies host_CR4.PKE=1.

The static PKU test is kept to elide the code on older CPUs.
Suggested-by: NYang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9dd21e1

KVM: x86: block guest protection keys unless the host has them enabled · c469268c

由 Paolo Bonzini 提交于 8月 24, 2017

If the host has protection keys disabled, we cannot read and write the
guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
the PKU cpuid bit in that case.

This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.

Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c469268c

24 8月, 2017 3 次提交

KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document them · bb9b52bd

由 Benjamin Herrenschmidt 提交于 8月 18, 2017

This adds missing memory barriers to order updates/tests of
the virtual CPPR and MFRR, thus fixing a lost IPI problem.

While at it also document all barriers in this file.

This fixes a bug causing guest IPIs to occasionally get lost.  The
symptom then is hangs or stalls in the guest.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

bb9b52bd

KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss · 2c4fb78f

由 Benjamin Herrenschmidt 提交于 8月 18, 2017

This adds a workaround for a bug in POWER9 DD1 chips where changing
the CPPR (Current Processor Priority Register) can cause bits in the
IPB (Interrupt Pending Buffer) to get lost. Thankfully it only
happens when manually manipulating CPPR which is quite rare. When it
does happen it can cause interrupts to be delayed or lost.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

2c4fb78f

KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9 · bd0fdb19

由 Nicholas Piggin 提交于 3月 13, 2017

When msgsnd is used for IPIs to other cores, msgsync must be executed by
the target to order stores performed on the source before its msgsnd
(provided the source executes the appropriate sync).

Fixes: 1704a81c ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

bd0fdb19

21 8月, 2017 2 次提交

KVM: s390: sthyi: fix specification exception detection · 857b8de9

由 Heiko Carstens 提交于 8月 03, 2017

sthyi should only generate a specification exception if the function
code is zero and the response buffer is not on a 4k boundary.

The current code would also test for unknown function codes if the
response buffer, that is currently only defined for function code 0,
is not on a 4k boundary and incorrectly inject a specification
exception instead of returning with condition code 3 and return code 4
(unsupported function code).

Fix this by moving the boundary check.

Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation")
Cc: <stable@vger.kernel.org> # 4.8+
Reviewed-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

857b8de9

KVM: s390: sthyi: fix sthyi inline assembly · 4a4eefcd

由 Heiko Carstens 提交于 8月 03, 2017

The sthyi inline assembly misses register r3 within the clobber
list. The sthyi instruction will always write a return code to
register "R2+1", which in this case would be r3. Due to that we may
have register corruption and see host crashes or data corruption
depending on how gcc decided to allocate and use registers during
compile time.

Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation")
Cc: <stable@vger.kernel.org> # 4.8+
Reviewed-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4a4eefcd

19 8月, 2017 2 次提交

mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes · c715b72c

由 Kees Cook 提交于 8月 18, 2017

Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
broke AddressSanitizer.  This is a partial revert of:

  eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
  02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")

The AddressSanitizer tool has hard-coded expectations about where
executable mappings are loaded.

The motivation for changing the PIE base in the above commits was to
avoid the Stack-Clash CVEs that allowed executable mappings to get too
close to heap and stack.  This was mainly a problem on 32-bit, but the
64-bit bases were moved too, in an effort to proactively protect those
systems (proofs of concept do exist that show 64-bit collisions, but
other recent changes to fix stack accounting and setuid behaviors will
minimize the impact).

The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
base), so only the 64-bit PIE base needs to be reverted to let x86 and
arm64 ASan binaries run again.  Future changes to the 64-bit PIE base on
these architectures can be made optional once a more dynamic method for
dealing with AddressSanitizer is found.  (e.g.  always loading PIE into
the mmap region for marked binaries.)

Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
Fixes: eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Fixes: 02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
Signed-off-by: NKees Cook <keescook@chromium.org>
Reported-by: NKostya Serebryany <kcc@google.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c715b72c

kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog · 92e5aae4

由 Nicholas Piggin 提交于 8月 18, 2017

Commit 05a4a952 ("kernel/watchdog: split up config options") lost
the perf-based hardlockup detector's dependency on PERF_EVENTS, which
can result in broken builds with some powerpc configurations.

Restore the dependency.  Add it in for x86 too, despite x86 always
selecting PERF_EVENTS it seems reasonable to make the dependency
explicit.

Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
Fixes: 05a4a952 ("kernel/watchdog: split up config options")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92e5aae4

18 8月, 2017 3 次提交

kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68

由 Thomas Gleixner 提交于 8月 15, 2017

The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.

The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.

A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.

Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.

That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.

Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: NKan Liang <Kan.liang@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos

7edaeb68

x86: Constify attribute_group structures · 45bd07ad

由 Arvind Yadav 提交于 7月 20, 2017

attribute_groups are not supposed to change at runtime and none of the
groups is modified.

Mark the non-const structs as const.

[ tglx: Folded into one big patch ]
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: tony.luck@intel.com
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com

45bd07ad

ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset · c40bc54f

由 Gary Bisson 提交于 8月 17, 2017

Previous value was a bad copy of nitrogen6_max device tree.
Signed-off-by: NGary Bisson <gary.bisson@boundarydevices.com>
Fixes: 3faa1bb2 ("ARM: dts: imx: add Boundary Devices Nitrogen6_SOM2 support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

c40bc54f

17 8月, 2017 3 次提交

x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' · 187e91fe

由 Alexander Potapenko 提交于 8月 16, 2017

__startup_64() is normally using fixup_pointer() to access globals in a
position-independent fashion. However 'next_early_pgt' was accessed
directly, which wasn't guaranteed to work.

Luckily GCC was generating a R_X86_64_PC32 PC-relative relocation for
'next_early_pgt', but Clang emitted a R_X86_64_32S, which led to
accessing invalid memory and rebooting the kernel.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Davidson <md@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c88d7150 ("x86/boot/64: Rewrite startup_64() in C")
Link: http://lkml.kernel.org/r/20170816190808.131748-1-glider@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

187e91fe

x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks · 01578e36

由 Oleg Nesterov 提交于 8月 15, 2017

The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and
randomize_stack_top() are not required.

PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not
set, no need to re-check after that.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com

01578e36

x86: Fix norandmaps/ADDR_NO_RANDOMIZE · 47ac5484

由 Oleg Nesterov 提交于 8月 15, 2017

Documentation/admin-guide/kernel-parameters.txt says:

    norandmaps  Don't use address space randomization. Equivalent
                to echo 0 > /proc/sys/kernel/randomize_va_space

but it doesn't work because arch_rnd() which is used to randomize
mm->mmap_base returns a random value unconditionally. And as Kirill
pointed out, ADDR_NO_RANDOMIZE is broken by the same reason.

Just shift the PF_RANDOMIZE check from arch_mmap_rnd() to arch_rnd().

Fixes: 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170815153952.GA1076@redhat.com

47ac5484

16 8月, 2017 1 次提交

powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC · 5a69aec9

由 Benjamin Herrenschmidt 提交于 8月 16, 2017

VSX uses a combination of the old vector registers, the old FP
registers and new "second halves" of the FP registers.

Thus when we need to see the VSX state in the thread struct
(flush_vsx_to_thread()) or when we'll use the VSX in the kernel
(enable_kernel_vsx()) we need to ensure they are all flushed into
the thread struct if either of them is individually enabled.

Unfortunately we only tested if the whole VSX was enabled, not if they
were individually enabled.

Fixes: 72cd7b44 ("powerpc: Uncomment and make enable_kernel_vsx() routine available")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5a69aec9

15 8月, 2017 1 次提交

x86/mtrr: Prevent CPU hotplug lock recursion · 84393817

由 Thomas Gleixner 提交于 8月 15, 2017

Larry reported a CPU hotplug lock recursion in the MTRR code.

============================================
WARNING: possible recursive locking detected

systemd-udevd/153 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30
 
 but task is already holding lock:
  (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470

....

 cpus_read_lock+0x48/0x90
 stop_machine+0x16/0x30
 mtrr_add_page+0x18b/0x470
 mtrr_add+0x3e/0x70

mtrr_add_page() holds the hotplug rwsem already and calls stop_machine()
which acquires it again.

Call stop_machine_cpuslocked() instead.
Reported-and-tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>

84393817

14 8月, 2017 1 次提交

arm64: allwinner: h5: fix pinctrl IRQs · d86e63e1

由 Icenowy Zheng 提交于 8月 11, 2017

The pin controller of H5 has three IRQs at the chip's GIC, which
represents three banks of pinctrl IRQs. However, the device tree used to
miss the third IRQ of the pin controller, which makes the PG bank IRQ
not usable.

Add the missing IRQ to the pinctrl node.

Fixes: 4e36de17 ("arm64: allwinner: h5: add Allwinner H5 .dtsi")
Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
Signed-off-by: NChen-Yu Tsai <wens@csie.org>

d86e63e1

11 8月, 2017 9 次提交

clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled · adb4f11e

由 Ding Tianhong 提交于 8月 10, 2017

On platforms with an arch timer erratum workaround, it's possible for
arch_timer_reg_read_stable() to recurse into itself when certain
tracing options are enabled, leading to stack overflows and related
problems.

For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
selected, it's possible to trigger this with:

$ mount -t debugfs nodev /sys/kernel/debug/
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer

The problem is that in such cases, preempt_disable() instrumentation
attempts to acquire a timestamp via trace_clock(), resulting in a call
back to arch_timer_reg_read_stable(), and hence recursion.

This patch changes arch_timer_reg_read_stable() to use
preempt_{disable,enable}_notrace(), which avoids this.

This problem is similar to the fixed by upstream commit 96b3d28b
("sched/clock: Prevent tracing recursion in sched_clock_cpu()").

Fixes: 6acc71cc ("arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs")
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

adb4f11e

xen: fix hvm guest with kaslr enabled · 4ca83dcf

由 Juergen Gross 提交于 7月 28, 2017

A Xen HVM guest running with KASLR enabled will die rather soon today
because the shared info page mapping is using va() too early. This was
introduced by commit a5d5f328 ("xen:
allocate page for shared info page from low memory").

In order to fix this use early_memremap() to get a temporary virtual
address for shared info until va() can be used safely.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>

4ca83dcf

xen: split up xen_hvm_init_shared_info() · 10231f69

由 Juergen Gross 提交于 7月 28, 2017

Instead of calling xen_hvm_init_shared_info() on boot and resume split
it up into a boot time function searching for the pfn to use and a
mapping function doing the hypervisor mapping call.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>

10231f69

x86: provide an init_mem_mapping hypervisor hook · c138d811

由 Juergen Gross 提交于 7月 28, 2017

Provide a hook in hypervisor_x86 called after setting up initial
memory mapping.

This is needed e.g. by Xen HVM guests to map the hypervisor shared
info page.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>

c138d811

x86: Mark various structures and functions as 'static' · b45e4c45

由 Colin Ian King 提交于 8月 10, 2017

Mark a couple of structures and functions as 'static', pointed out by Sparse:

warning: symbol 'bts_pmu' was not declared. Should it be static?
warning: symbol 'p4_event_aliases' was not declared. Should it be static?
warning: symbol 'rapl_attr_groups' was not declared. Should it be static?
symbol 'process_uv2_message' was not declared. Should it be static?
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: Andrew Banman <abanman@hpe.com> # for the UV change
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20170810155709.7094-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b45e4c45

x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag · 5442c269

由 Borislav Petkov 提交于 8月 01, 2017

"virtual_vmload_vmsave" is what is going to land in /proc/cpuinfo now
as per v4.13-rc4, for a single feature bit which is clearly too long.

So rename it to what it is called in the processor manual.
"v_vmsave_vmload" is a bit shorter, after all.

We could go more aggressively here but having it the same as in the
processor manual is advantageous.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm-ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170801185552.GA3743@nazgul.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

5442c269

cpufreq: x86: Disable interrupts during MSRs reading · 8e2f3bce

由 Doug Smythies 提交于 8月 08, 2017

According to Intel 64 and IA-32 Architectures SDM, Volume 3,
Chapter 14.2, "Software needs to exercise care to avoid delays
between the two RDMSRs (for example interrupts)".

So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.

See also: commit 4ab60c3f (cpufreq: intel_pstate: Disable
interrupts during MSRs reading).
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Reviewed-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8e2f3bce

mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21

由 Minchan Kim 提交于 8月 10, 2017

Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman:
 "The race in question is CPU 0 running madv_free and updating some PTEs
  while CPU 1 is also running madv_free and looking at the same PTEs.
  CPU 1 may have writable TLB entries for a page but fail the pte_dirty
  check (because CPU 0 has updated it already) and potentially fail to
  flush.

  Hence, when madv_free on CPU 1 returns, there are still potentially
  writable TLB entries and the underlying PTE is still present so that a
  subsequent write does not necessarily propagate the dirty bit to the
  underlying PTE any more. Reclaim at some unknown time at the future
  may then see that the PTE is still clean and discard the page even
  though a write has happened in the meantime. I think this is possible
  but I could have missed some protection in madv_free that prevents it
  happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on.  In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.

I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.

NOTE:

This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um).  It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends.  However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
  Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NNadav Amit <namit@vmware.com>
Reported-by: NNadav Amit <namit@vmware.com>
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99baac21

mm: refactor TLB gathering API · 56236a59

由 Minchan Kim 提交于 8月 10, 2017

This patch is a preparatory patch for solving race problems caused by
TLB batch.  For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.

Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.

It shouldn't change any behavior.

Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NNadav Amit <namit@vmware.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

56236a59

10 8月, 2017 6 次提交

x86/smpboot: Unbreak CPU0 hotplug · 10e66760

由 Vitaly Kuznetsov 提交于 8月 03, 2017

A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().

So it's a classic ABBA deadlock, due to the use of synchronize_sched() in
unregister_nmi_handler().

Fix the bug by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170803105818.9934-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

10e66760

x86/asm/64: Clear AC on NMI entries · e93c1730

由 Andy Lutomirski 提交于 8月 07, 2017

This closes a hole in our SMAP implementation.

This patch comes from grsecurity. Good catch!
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/314cc9f294e8f14ed85485727556ad4f15bb1659.1502159503.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e93c1730

perf/x86: Fix RDPMC vs. mm_struct tracking · bfe33492

由 Peter Zijlstra 提交于 8月 02, 2017

Vince reported the following rdpmc() testcase failure:

 > Failing test case:
 >
 >	fd=perf_event_open();
 >	addr=mmap(fd);
 >	exec()  // without closing or unmapping the event
 >	fd=perf_event_open();
 >	addr=mmap(fd);
 >	rdpmc()	// GPFs due to rdpmc being disabled

The problem is of course that exec() plays tricks with what is
current->mm, only destroying the old mappings after having
installed the new mm.

Fix this confusion by passing along vma->vm_mm instead of relying on
current->mm.
Reported-by: NVince Weaver <vincent.weaver@maine.edu>
Tested-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e0fb9ec ("perf: Add pmu callbacks to track event mapping and unmapping")
Link: http://lkml.kernel.org/r/20170802173930.cstykcqefmqt7jau@hirez.programming.kicks-ass.net
[ Minor cleanups. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

bfe33492

arm64: allwinner: a64: sopine: add missing ethernet0 alias · 56a91550

由 Icenowy Zheng 提交于 7月 22, 2017

The EMAC Ethernet controller was enabled, but an accompanying alias
was not added. This results in unstable numbering if other Ethernet
devices, such as a USB dongle, are present. Also, the bootloader uses
the alias to assign a generated stable MAC address to the device node.
Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Fixes: 96219b00 ("arm64: allwinner: a64: add device tree for SoPine
		      with baseboard")
[wens@csie.org: Rewrite commit log as fixing a previous patch with Fixes]
Signed-off-by: NChen-Yu Tsai <wens@csie.org>

56a91550

arm64: allwinner: a64: pine64: add missing ethernet0 alias · dff751c6

由 Icenowy Zheng 提交于 7月 22, 2017

The EMAC Ethernet controller was enabled, but an accompanying alias
was not added. This results in unstable numbering if other Ethernet
devices, such as a USB dongle, are present. Also, the bootloader uses
the alias to assign a generated stable MAC address to the device node.
Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Fixes: 97023943 ("arm64: allwinner: pine64: Enable dwmac-sun8i")
[wens@csie.org: Rewrite commit log as fixing a previous patch with Fixes]
Signed-off-by: NChen-Yu Tsai <wens@csie.org>

dff751c6

arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias · 7dc88d2a

由 Icenowy Zheng 提交于 7月 22, 2017

The EMAC Ethernet controller was enabled, but an accompanying alias
was not added. This results in unstable numbering if other Ethernet
devices, such as a USB dongle, are present. Also, the bootloader uses
the alias to assign a generated stable MAC address to the device node.
Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Fixes: e7295499 ("arm64: allwinner: bananapi-m64: Enable dwmac-sun8i")
[wens@csie.org: Rewrite commit log as fixing a previous patch with Fixes]
Signed-off-by: NChen-Yu Tsai <wens@csie.org>

7dc88d2a

09 8月, 2017 7 次提交

powerpc/watchdog: add locking around init/exit functions · 96ea91e7

由 Nicholas Piggin 提交于 8月 09, 2017

When CPUs start and stop the watchdog, they manipulate shared data
that is normally protected by the lock. Other CPUs can be running
concurrently at this time, so it's a good idea to use locking here
to be on the safe side.

Remove the barrier which is undocumented and didn't do anything.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

96ea91e7

powerpc/watchdog: Fix marking of stuck CPUs · 87607a30

由 Nicholas Piggin 提交于 8月 09, 2017

When the SMP detector finds other CPUs stuck, it iterates over
them and marks them as stuck. This pulls them out of the pending
mask and allows the detector to continue with remaining good
CPUs (if nmi_watchdog=panic is not enabled).

The code to dothat was buggy because when setting a CPU stuck,
if the pending mask became empty, it resets it to keep the
watchdog running. However the iterator will continue to run
over the new pending mask and mark remaining good CPUs sas stuck.

Fix this by doing it with cpumask bitwise operations.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

87607a30

powerpc/watchdog: Fix final-check recovered case · 8e236921

由 Nicholas Piggin 提交于 8月 09, 2017

When the watchdog decides to panic, it takes the lock and double
checks everything (to avoid races with the CPU being unstuck or
panic()ed by something else).

The exit label was misplaced and would result in all-CPUs backtrace
and watchdog panic even in the case that the condition was found to be
resolved.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8e236921

powerpc/watchdog: Moderate touch_nmi_watchdog overhead · 26c5c6e1

由 Nicholas Piggin 提交于 8月 09, 2017

Some code can go into a tight loop calling touch_nmi_watchdog (e.g.,
stop_machine CPU hotplug code). This can cause contention on watchdog
locks particularly if all CPUs with watchdog enabled are spinning in
the loops.

Avoid this storm of activity by running the watchdog timer callback
from this path if we have exceeded the timer period since it was last
run.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

26c5c6e1

powerpc/watchdog: Improve watchdog lock primitive · d8e2a405

由 Nicholas Piggin 提交于 8月 09, 2017

- Hard-disable interrupts before taking the lock, which prevents
  soft-NMI re-entrancy and therefore can prevent deadlocks.
- Use raw_ variants of local_irq_disable to avoid irq debugging.
- When the lock is contended, spin at low SMT priority, using
  loads only, and with interrupts enabled (where possible).

Some stalls have been noticed at high loads that go away with improved
locking. There should not be so much locking contention in the first
place (which is addressed in a subsequent patch), but locking should
still be improved.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d8e2a405

powerpc: NMI IPI improve lock primitive · 0459ddfd

由 Nicholas Piggin 提交于 8月 09, 2017

When the NMI IPI lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible). This
improves behaviour under high contention (e.g., a system crash when
a number of CPUs are trying to enter the debugger).
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0459ddfd

powerpc/configs: Re-enable HARD/SOFT lockup detectors · 7310d5c8

由 Michael Ellerman 提交于 8月 09, 2017

In commit 05a4a952 ("kernel/watchdog: split up config options"),
CONFIG_LOCKUP_DETECTOR was split into two separate config options,
CONFIG_HARDLOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR.

Our defconfigs still have CONFIG_LOCKUP_DETECTOR=y, but that is no longer
user selectable, and we don't mention the new options, so we end up with
none of them enabled.

So update the defconfigs to turn on the new SOFT and HARD options, the
end result being the same as what we had previously.

Fixes: 05a4a952 ("kernel/watchdog: split up config options")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7310d5c8

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功