提交 · deaf49f39dcea3efc901c3ef21fc96ee4e875f94 · openanolis / cloud-kernel

15 8月, 2019 1 次提交

x86/cpufeatures: Carve out CQM features retrieval · deaf49f3

由 Borislav Petkov 提交于 6月 19, 2019

commit 45fc56e629caa451467e7664fbd4c797c434a6c4 upstream.

... into a separate function for better readability. Split out from a
patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
sole code movement separate for easy review.

No functional changes.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

deaf49f3

06 8月, 2019 2 次提交

x86: uaccess: Inhibit speculation past access_ok() in user_access_begin() · e28b4155

由 Will Deacon 提交于 1月 19, 2019

commit 6e693b3ffecb0b478c7050b44a4842854154f715 upstream.

Commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
makes the access_ok() check part of the user_access_begin() preceding a
series of 'unsafe' accesses.  This has the desirable effect of ensuring
that all 'unsafe' accesses have been range-checked, without having to
pick through all of the callsites to verify whether the appropriate
checking has been made.

However, the consolidated range check does not inhibit speculation, so
it is still up to the caller to ensure that they are not susceptible to
any speculative side-channel attacks for user addresses that ultimately
fail the access_ok() check.

This is an oversight, so use __uaccess_begin_nospec() to ensure that
speculation is inhibited until the access_ok() check has passed.
Reported-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e28b4155

make 'user_access_begin()' do 'access_ok()' · aba1b548

由 Linus Torvalds 提交于 1月 04, 2019

commit 594cc251fdd0d231d342d88b2fdff4bc42fb0690 upstream.

Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

[ Shile: fix following conflicts by adding a dummy arguments ]
Conflicts:
	kernel/compat.c
	kernel/exit.c
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>

aba1b548

10 7月, 2019 1 次提交

x86/kvmclock: set offset for kvm unstable clock · 70150149

由 Pavel Tatashin 提交于 7月 10, 2019

commit b5179ec4187251a751832193693d6e474d3445ac upstream

VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
unstable clock. The problem is produced when VM is rebooted without exiting
from qemu.

The fix is to calculate clock offset not only for stable clock but for
unstable clock as well, and use kvm_sched_clock_read() which substracts
the offset for both clocks.

This is safe, because pvclock_clocksource_read() does the right thing and
makes sure that clock always goes forward, so once offset is calculated
with unstable clock, we won't get new reads that are smaller than offset,
and thus won't get negative results.

Thank you Jon DeVree for helping to reproduce this issue.

Fixes: 857baa87 ("sched/clock: Enable sched clock early")
Cc: stable@vger.kernel.org
Reported-by: NDominique Martinet <asmadeus@codewreck.org>
Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXingjun Liu <xingjun.liu@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

70150149

05 7月, 2019 4 次提交

sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD · 63c14678

由 Johannes Weiner 提交于 10月 26, 2018

commit 8508cf3ffad4defa202b303e5b6379efc4cd9054 upstream.

There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[Joseph] use stat.mean instead of stat->rqs.mean to fix conflicts in
iolatency_check_latencies();

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NSuren Baghdasaryan <surenb@google.com>
Tested-by: NDaniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

63c14678

kconfig: Disable x86 clocksource watchdog · bbb76873

由 Jiufei Xue 提交于 1月 14, 2019

Unstable tsc will trigger clocksource watchdog and disable itself, as a
result other clocksource will be elected as the current clocksource
which will result in performace issue on our servers.

RHEL7 also disabled this feature for some issues, see changelog:
[x86] disable clocksource watchdog (Prarit Bhargava) [914709]
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

bbb76873

Revert "x86/tsc: Prepare warp test for TSC adjustment" · f5f62304

由 Jiufei Xue 提交于 1月 10, 2019

This reverts commit 76d3b851.

The returned value for check_tsc_warp() is useless now, remove it.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f5f62304

Revert "x86/tsc: Try to adjust TSC if sync test fails" · 64a17a82

由 Jiufei Xue 提交于 1月 10, 2019

This reverts commit cc4db268.

When we do hot-add and enable vCPU, the time inside the VM jumps and
then VM stucks.
The dmesg shows like this:
[   48.402948] CPU2 has been hot-added
[   48.413774] smpboot: Booting Node 0 Processor 2 APIC 0x2
[   48.415155] kvm-clock: cpu 2, msr 6b615081, secondary cpu clock
[   48.453690] TSC ADJUST compensate: CPU2 observed 139318776350 warp.  Adjust: 139318776350
[  102.060874] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[  102.060874] clocksource:                       'kvm-clock' wd_now: 1cb1cfc4bf8 wd_last: 1be9588f1fe mask: ffffffffffffffff
[  102.060874] clocksource:                       'tsc' cs_now: 207d794f7e cs_last: 205a32697a mask: ffffffffffffffff
[  102.060874] tsc: Marking TSC unstable due to clocksource watchdog
[  102.070188] KVM setup async PF for cpu 2
[  102.071461] kvm-stealtime: cpu 2, msr 13ba95000
[  102.074530] Will online and init hotplugged CPU: 2

This is because the TSC for the newly added VCPU is initialized to 0
while others are ahead. Guest will do the TSC ADJUST compensate and
cause the time jumps.

Commit bd8fab39("KVM: x86: fix maintaining of kvm_clock stability
on guest CPU hotplug") can fix this problem.  However, the host kernel
version may be older, so do not ajust TSC if sync test fails, just mark
it unstable.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

64a17a82

03 7月, 2019 9 次提交

arm64: insn: Fix ldadd instruction encoding · 3919d91f

由 Jean-Philippe Brucker 提交于 5月 24, 2019

commit c5e2edeb01ae9ffbdde95bdcdb6d3614ba1eb195 upstream.

GCC 8.1.0 reports that the ldadd instruction encoding, recently added to
insn.c, doesn't match the mask and couldn't possibly be identified:

linux/arch/arm64/include/asm/insn.h: In function 'aarch64_insn_is_ldadd':
linux/arch/arm64/include/asm/insn.h:280:257: warning: bitwise comparison always evaluates to false [-Wtautological-compare]

Bits [31:30] normally encode the size of the instruction (1 to 8 bytes)
and the current instruction value only encodes the 4- and 8-byte
variants. At the moment only the BPF JIT needs this instruction, and
doesn't require the 1- and 2-byte variants, but to be consistent with
our other ldr and str instruction encodings, clear the size field in the
insn value.

Fixes: 34b8ab091f9ef57a ("bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd")
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Reported-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3919d91f

bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd · 4423a82c

由 Daniel Borkmann 提交于 4月 26, 2019

commit 34b8ab091f9ef57a2bb3c8c8359a0a03a8abf2f9 upstream.

Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4423a82c

arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg() · 436869e0

由 Will Deacon 提交于 4月 10, 2019

commit 8e4e0ac02b449297b86498ac24db5786ddd9f647 upstream.

Returning an error code from futex_atomic_cmpxchg_inatomic() indicates
that the caller should not make any use of *uval, and should instead act
upon on the value of the error code. Although this is implemented
correctly in our futex code, we needlessly copy uninitialised stack to
*uval in the error case, which can easily be avoided.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

436869e0

irqchip/mips-gic: Use the correct local interrupt map registers · c22cea5a

由 Paul Burton 提交于 6月 05, 2019

commit 6d4d367d0e9ffab4d64a3436256a6a052dc1195d upstream.

The MIPS GIC contains a block of registers used to map local interrupts
to a particular CPU interrupt pin. Since these registers are found at a
consecutive range of addresses we access them using an index, via the
(read|write)_gic_v[lo]_map accessor functions. We currently use values
from enum mips_gic_local_interrupt as those indices.

Unfortunately whilst enum mips_gic_local_interrupt provides the correct
offsets for bits in the pending & mask registers, the ordering of the
map registers is subtly different... Compared with the ordering of
pending & mask bits, the map registers move the FDC from the end of the
list to index 3 after the timer interrupt. As a result the performance
counter & software interrupts are therefore at indices 4-6 rather than
indices 3-5.

Notably this causes problems with performance counter interrupts being
incorrectly mapped on some systems, and presumably will also cause
problems for FDC interrupts.

Introduce a function to map from enum mips_gic_local_interrupt to the
index of the corresponding map register, and use it to ensure we access
the map registers for the correct interrupts.
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Fixes: a0dc5cb5 ("irqchip: mips-gic: Simplify gic_local_irq_domain_map()")
Fixes: da61fcf9 ("irqchip: mips-gic: Use irq_cpu_online to (un)mask all-VP(E) IRQs")
Reported-and-tested-by: NArcher Yan <ayan@wavecomp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c22cea5a

KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT · 01a02a98

由 Sean Christopherson 提交于 6月 13, 2019

commit b6b80c78af838bef17501416d5d383fedab0010a upstream.

SVM's Nested Page Tables (NPT) reuses x86 paging for the host-controlled
page walk.  For 32-bit KVM, this means PAE paging is used even when TDP
is enabled, i.e. the PAE root array needs to be allocated.

Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled")
Cc: stable@vger.kernel.org
Reported-by: NJiri Palecek <jpalecek@web.de>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Jiri Palecek <jpalecek@web.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

01a02a98

x86/resctrl: Prevent possible overrun during bitmap operations · 32746032

由 Reinette Chatre 提交于 6月 19, 2019

commit 32f010deab575199df4ebe7b6aec20c17bb7eccd upstream.

While the DOC at the beginning of lib/bitmap.c explicitly states that
"The number of valid bits in a given bitmap does _not_ need to be an
exact multiple of BITS_PER_LONG.", some of the bitmap operations do
indeed access BITS_PER_LONG portions of the provided bitmap no matter
the size of the provided bitmap.

For example, if find_first_bit() is provided with an 8 bit bitmap the
operation will access BITS_PER_LONG bits from the provided bitmap. While
the operation ensures that these extra bits do not affect the result,
the memory is still accessed.

The capacity bitmasks (CBMs) are typically stored in u32 since they
can never exceed 32 bits. A few instances exist where a bitmap_*
operation is performed on a CBM by simply pointing the bitmap operation
to the stored u32 value.

The consequence of this pattern is that some bitmap_* operations will
access out-of-bounds memory when interacting with the provided CBM.

This same issue has previously been addressed with commit 49e00eee
("x86/intel_rdt: Fix out-of-bounds memory access in CBM tests")
but at that time not all instances of the issue were fixed.

Fix this by using an unsigned long to store the capacity bitmask data
that is passed to bitmap functions.

Fixes: e6519011 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details")
Fixes: f4e80d67 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information")
Fixes: 95f0b77e ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/58c9b6081fd9bf599af0dfc01a6fdd335768efef.1560975645.git.reinette.chatre@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

32746032

x86/microcode: Fix the microcode load on CPU hotplug for real · 1746dc52

由 Thomas Gleixner 提交于 6月 18, 2019

commit 5423f5ce5ca410b3646f355279e4e937d452e622 upstream.

A recent change moved the microcode loader hotplug callback into the early
startup phase which is running with interrupts disabled. It missed that
the callbacks invoke sysfs functions which might sleep causing nice 'might
sleep' splats with proper debugging enabled.

Split the callbacks and only load the microcode in the early startup phase
and move the sysfs handling back into the later threaded and preemptible
bringup phase where it was before.

Fixes: 78f4e932f776 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1746dc52

x86/speculation: Allow guests to use SSBD even if host does not · 690049ed

由 Alejandro Jimenez 提交于 6月 10, 2019

commit c1f7fec1eb6a2c86d01bc22afce772c743451d88 upstream.

The bits set in x86_spec_ctrl_mask are used to calculate the guest's value
of SPEC_CTRL that is written to the MSR before VMENTRY, and control which
mitigations the guest can enable.  In the case of SSBD, unless the host has
enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in
the kernel parameters), the SSBD bit is not set in the mask and the guest
can not properly enable the SSBD always on mitigation mode.

This has been confirmed by running the SSBD PoC on a guest using the SSBD
always on mitigation mode (booted with kernel parameter
"spec_store_bypass_disable=on"), and verifying that the guest is vulnerable
unless the host is also using SSBD always on mode. In addition, the guest
OS incorrectly reports the SSB vulnerability as mitigated.

Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports
it, allowing the guest to use SSBD whether or not the host has chosen to
enable the mitigation in any of its modes.

Fixes: be6fcb54 ("x86/bugs: Rework spec_ctrl base and mask logic")
Signed-off-by: NAlejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: bp@alien8.de
Cc: rkrcmar@redhat.com
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

690049ed

arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS · 85a3b1ef

由 Nathan Chancellor 提交于 6月 11, 2019

commit fa63da2ab046b885a7f70291aafc4e8ce015429b upstream.

This is a GCC only option, which warns about ABI changes within GCC, so
unconditionally adding it breaks Clang with tons of:

warning: unknown warning option '-Wno-psabi' [-Wunknown-warning-option]

and link time failures:

ld.lld: error: undefined symbol: __efistub___stack_chk_guard
>>> referenced by arm-stub.c:73
(/home/nathan/cbl/linux/drivers/firmware/efi/libstub/arm-stub.c:73)
>>>               arm-stub.stub.o:(__efistub_install_memreserve_table)
in archive ./drivers/firmware/efi/libstub/lib.a

These failures come from the lack of -fno-stack-protector, which is
added via cc-option in drivers/firmware/efi/libstub/Makefile. When an
unknown flag is added to KBUILD_CFLAGS, clang will noisily warn that it
is ignoring the option like above, unlike gcc, who will just error.

$ echo "int main() { return 0; }" > tmp.c

$ clang -Wno-psabi tmp.c; echo $?
warning: unknown warning option '-Wno-psabi' [-Wunknown-warning-option]
1 warning generated.
0

$ gcc -Wsometimes-uninitialized tmp.c; echo $?
gcc: error: unrecognized command line option
‘-Wsometimes-uninitialized’; did you mean ‘-Wmaybe-uninitialized’?
1

For cc-option to work properly with clang and behave like gcc, -Werror
is needed, which was done in commit c3f0d0bc ("kbuild, LLVMLinux:
Add -Werror to cc-option to support clang").

$ clang -Werror -Wno-psabi tmp.c; echo $?
error: unknown warning option '-Wno-psabi'
[-Werror,-Wunknown-warning-option]
1

As a consequence of this, when an unknown flag is unconditionally added
to KBUILD_CFLAGS, it will cause cc-option to always fail and those flags
will never get added:

$ clang -Werror -Wno-psabi -fno-stack-protector tmp.c; echo $?
error: unknown warning option '-Wno-psabi'
[-Werror,-Wunknown-warning-option]
1

This can be seen when compiling the whole kernel as some warnings that
are normally disabled (see below) show up. The full list of flags
missing from drivers/firmware/efi/libstub are the following (gathered
from diffing .arm64-stub.o.cmd):

-fno-delete-null-pointer-checks
-Wno-address-of-packed-member
-Wframe-larger-than=2048
-Wno-unused-const-variable
-fno-strict-overflow
-fno-merge-all-constants
-fno-stack-check
-Werror=date-time
-Werror=incompatible-pointer-types
-ffreestanding
-fno-stack-protector

Use cc-disable-warning so that it gets disabled for GCC and does nothing
for Clang.

Fixes: ebcc5928c5d9 ("arm64: Silence gcc warnings about arch ABI drift")
Link: https://github.com/ClangBuiltLinux/linux/issues/511Reported-by: NQian Cai <cai@lca.pw>
Acked-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

85a3b1ef

25 6月, 2019 20 次提交

powerpc/mm/64s/hash: Reallocate context ids on fork · cd3e4939

由 Michael Ellerman 提交于 6月 12, 2019

commit ca72d88378b2f2444d3ec145dd442d449d3fefbc upstream.

When using the Hash Page Table (HPT) MMU, userspace memory mappings
are managed at two levels. Firstly in the Linux page tables, much like
other architectures, and secondly in the SLB (Segment Lookaside
Buffer) and HPT. It's the SLB and HPT that are actually used by the
hardware to do translations.

As part of the series adding support for 4PB user virtual address
space using the hash MMU, we added support for allocating multiple
"context ids" per process, one for each 512TB chunk of address space.
These are tracked in an array called extended_id in the mm_context_t
of a process that has done a mapping above 512TB.

If such a process forks (ie. clone(2) without CLONE_VM set) it's mm is
copied, including the mm_context_t, and then init_new_context() is
called to reinitialise parts of the mm_context_t as appropriate to
separate the address spaces of the two processes.

The key step in ensuring the two processes have separate address
spaces is to allocate a new context id for the process, this is done
at the beginning of hash__init_new_context(). If we didn't allocate a
new context id then the two processes would share mappings as far as
the SLB and HPT are concerned, even though their Linux page tables
would be separate.

For mappings above 512TB, which use the extended_id array, we
neglected to allocate new context ids on fork, meaning the parent and
child use the same ids and therefore share those mappings even though
they're supposed to be separate. This can lead to the parent seeing
writes done by the child, which is essentially memory corruption.

There is an additional exposure which is that if the child process
exits, all its context ids are freed, including the context ids that
are still in use by the parent for mappings above 512TB. One or more
of those ids can then be reallocated to a third process, that process
can then read/write to the parent's mappings above 512TB. Additionally
if the freed id is used for the third process's primary context id,
then the parent is able to read/write to the third process's mappings
*below* 512TB.

All of these are fundamental failures to enforce separation between
processes. The only mitigating factor is that the bug only occurs if a
process creates mappings above 512TB, and most applications still do
not create such mappings.

Only machines using the hash page table MMU are affected, eg. PowerPC
970 (G5), PA6T, Power5/6/7/8/9. By default Power9 bare metal machines
(powernv) use the Radix MMU and are not affected, unless the machine
has been explicitly booted in HPT mode (using disable_radix on the
kernel command line). KVM guests on Power9 may be affected if the host
or guest is configured to use the HPT MMU. LPARs under PowerVM on
Power9 are affected as they always use the HPT MMU. Kernels built with
PAGE_SIZE=4K are not affected.

The fix is relatively simple, we need to reallocate context ids for
all extended mappings on fork.

Fixes: f384796c ("powerpc/mm: Add support for handling > 512TB address in SLB miss")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cd3e4939

x86/resctrl: Don't stop walking closids when a locksetup group is found · 8c4fe200

由 James Morse 提交于 6月 03, 2019

commit 87d3aa28f345bea77c396855fa5d5fec4c24461f upstream.

When a new control group is created __init_one_rdt_domain() walks all
the other closids to calculate the sets of used and unused bits.

If it discovers a pseudo_locksetup group, it breaks out of the loop.  This
means any later closid doesn't get its used bits added to used_b.  These
bits will then get set in unused_b, and added to the new control group's
configuration, even if they were marked as exclusive for a later closid.

When encountering a pseudo_locksetup group, we should continue. This is
because "a resource group enters 'pseudo-locked' mode after the schemata is
written while the resource group is in 'pseudo-locksetup' mode." When we
find a pseudo_locksetup group, its configuration is expected to be
overwritten, we can skip it.

Fixes: dfe9674b ("x86/intel_rdt: Enable entering of pseudo-locksetup mode")
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NReinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H Peter Avin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190603172531.178830-1-james.morse@arm.com
[Dropped comment due to lack of space]
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8c4fe200

arm64: ssbd: explicitly depend on <linux/prctl.h> · 7499528b

由 Anisse Astier 提交于 6月 17, 2019

commit adeaa21a4b6954e878f3f7d1c5659ed9c1fe567a upstream.

Fix ssbd.c which depends implicitly on asm/ptrace.h including
linux/prctl.h (through for example linux/compat.h, then linux/time.h,
linux/seqlock.h, linux/spinlock.h and linux/irqflags.h), and uses
PR_SPEC* defines.

This is an issue since we'll soon be removing the include from
asm/ptrace.h.

Fixes: 9cdc0108 ("arm64: ssbd: Add prctl interface for per-thread mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: NAnisse Astier <aastier@freebox.fr>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7499528b

arm64/sve: <uapi/asm/ptrace.h> should not depend on <uapi/linux/prctl.h> · 3e16b5c2

由 Anisse Astier 提交于 6月 17, 2019

commit 35341ca0614ab13e1ef34ad4f29a39e15ef31fa8 upstream.

Pulling linux/prctl.h into asm/ptrace.h in the arm64 UAPI headers causes
userspace build issues for any program (e.g. strace and qemu) that
includes both <sys/prctl.h> and <linux/ptrace.h> when using musl libc:

  | error: redefinition of 'struct prctl_mm_map'
  |  struct prctl_mm_map {

See https://github.com/foundriesio/meta-lmp/commit/6d4a106e191b5d79c41b9ac78fd321316d3013c0
for a public example of people working around this issue.

Although it's a bit grotty, fix this breakage by duplicating the prctl
constant definitions. Since these are part of the kernel ABI, they
cannot be changed in future and so it's not the end of the world to have
them open-coded.

Fixes: 43d4da2c ("arm64/sve: ptrace and ELF coredump support")
Cc: stable@vger.kernel.org
Acked-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NAnisse Astier <aastier@freebox.fr>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3e16b5c2

ARM: dts: am57xx-idk: Remove support for voltage switching for SD card · 2296fd59

由 Faiz Abbas 提交于 5月 02, 2019

commit 88a748419b84187fd1da05637b8e5928b04a1e06 upstream.

If UHS speed modes are enabled, a compatible SD card switches down to
1.8V during enumeration. If after this a software reboot/crash takes
place and on-chip ROM tries to enumerate the SD card, the difference in
IO voltages (host @ 3.3V and card @ 1.8V) may end up damaging the card.

The fix for this is to have support for power cycling the card in
hardware (with a PORz/soft-reset line causing a power cycle of the
card). Since am571x-, am572x- and am574x-idk don't have this
capability, disable voltage switching for these boards.

The major effect of this is that the maximum supported speed
mode is now high speed(50 MHz) down from SDR104(200 MHz).

Cc: <stable@vger.kernel.org>
Signed-off-by: NFaiz Abbas <faiz_abbas@ti.com>
Signed-off-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2296fd59

ARM: dts: dra76x: Update MMC2_HS200_MANUAL1 iodelay values · cc87ab84

由 Faiz Abbas 提交于 4月 30, 2019

commit c3c0b70cd3f801bded7a548198ee1c9851a0ca82 upstream.

Update the MMC2_HS200_MANUAL1 iodelay values to match with the latest
dra76x data manual[1]. The new iodelay values will have better marginality
and should prevent issues in corner cases.

Also this particular pinctrl-array is using spaces instead of tabs for
spacing between the values and the comments. Fix this as well.

[1] http://www.ti.com/lit/ds/symlink/dra76p.pdf

Cc: <stable@vger.kernel.org>
Signed-off-by: NFaiz Abbas <faiz_abbas@ti.com>
[tony@atomide.com: updated description with a bit more info]
Signed-off-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cc87ab84

ARM: imx: cpuidle-imx6sx: Restrict the SW2ISO increase to i.MX6SX · 03426208

由 Fabio Estevam 提交于 5月 13, 2019

commit b25af2ff7c07bd19af74e3f64ff82e2880d13d81 upstream.

Since commit 1e434b703248 ("ARM: imx: update the cpu power up timing
setting on i.mx6sx") some characters loss is noticed on i.MX6ULL UART
as reported by Christoph Niedermaier.

The intention of such commit was to increase the SW2ISO field for i.MX6SX
only, but since cpuidle-imx6sx is also used on i.MX6UL/i.MX6ULL this caused
unintended side effects on other SoCs.

Fix this problem by keeping the original SW2ISO value for i.MX6UL/i.MX6ULL
and only increase SW2ISO in the i.MX6SX case.

Cc: stable@vger.kernel.org
Fixes: 1e434b703248 ("ARM: imx: update the cpu power up timing setting on i.mx6sx")
Reported-by: NChristoph Niedermaier <cniedermaier@dh-electronics.com>
Signed-off-by: NFabio Estevam <festevam@gmail.com>
Tested-by: NSébastien Szymanski <sebastien.szymanski@armadeus.com>
Tested-by: NChristoph Niedermaier <cniedermaier@dh-electronics.com>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

03426208

powerpc/bpf: use unsigned division instruction for 64-bit operations · 48ee85dc

由 Naveen N. Rao 提交于 6月 13, 2019

commit 758f2046ea040773ae8ea7f72dd3bbd8fa984501 upstream.

BPF_ALU64 div/mod operations are currently using signed division, unlike
BPF_ALU32 operations. Fix the same. DIV64 and MOD64 overflow tests pass
with this fix.

Fixes: 156d0e29 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

48ee85dc

riscv: mm: synchronize MMU after pte change · a96ac5cb

由 ShihPo Hung 提交于 6月 17, 2019

commit bf587caae305ae3b4393077fb22c98478ee55755 upstream.

Because RISC-V compliant implementations can cache invalid entries
in TLB, an SFENCE.VMA is necessary after changes to the page table.
This patch adds an SFENCE.vma for the vmalloc_fault path.
Signed-off-by: NShihPo Hung <shihpo.hung@sifive.com>
[paul.walmsley@sifive.com: reversed tab->whitespace conversion,
 wrapped comment lines]
Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: linux-riscv@lists.infradead.org
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a96ac5cb

arm64: Silence gcc warnings about arch ABI drift · 71d019a6

由 Dave Martin 提交于 6月 06, 2019

[ Upstream commit ebcc5928c5d925b1c8d968d9c89cdb0d0186db17 ]

Since GCC 9, the compiler warns about evolution of the
platform-specific ABI, in particular relating for the marshaling of
certain structures involving bitfields.

The kernel is a standalone binary, and of course nobody would be
so stupid as to expose structs containing bitfields as function
arguments in ABI.  (Passing a pointer to such a struct, however
inadvisable, should be unaffected by this change.  perf and various
drivers rely on that.)

So these warnings do more harm than good: turn them off.

We may miss warnings about future ABI drift, but that's too bad.
Future ABI breaks of this class will have to be debugged and fixed
the traditional way unless the compiler evolves finer-grained
diagnostics.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

71d019a6

sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD · 16cdab63

由 Young Xiao 提交于 5月 29, 2019

[ Upstream commit 56cd0aefa475079e9613085b14a0f05037518fed ]

The PERF_EVENT_IOC_PERIOD ioctl command can be used to change the
sample period of a running perf_event. Consequently, when calculating
the next event period, the new period will only be considered after the
previous one has overflowed.

This patch changes the calculation of the remaining event ticks so that
they are offset if the period has changed.

See commit 3581fe0e ("ARM: 7556/1: perf: fix updated event period in
response to PERF_EVENT_IOC_PERIOD") for details.
Signed-off-by: NYoung Xiao <92siuyang@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

16cdab63

mdesc: fix a missing-check bug in get_vdev_port_node_info() · 7b460a9b

由 Gen Zhang 提交于 5月 31, 2019

[ Upstream commit 80caf43549e7e41a695c6d1e11066286538b336f ]

In get_vdev_port_node_info(), 'node_info->vdev_port.name' is allcoated
by kstrdup_const(), and it returns NULL when fails. So
'node_info->vdev_port.name' should be checked.
Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7b460a9b

xtensa: Fix section mismatch between memblock_reserve and mem_reserve · ae0d1c08

由 Guenter Roeck 提交于 5月 30, 2019

[ Upstream commit adefd051a6707a6ca0ebad278d3c1c05c960fc3b ]

Since commit 9012d011660ea5cf2 ("compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING"), xtensa:tinyconfig fails to build with section
mismatch errors.

WARNING: vmlinux.o(.text.unlikely+0x68): Section mismatch in reference
	from the function ___pa()
	to the function .meminit.text:memblock_reserve()
WARNING: vmlinux.o(.text.unlikely+0x74): Section mismatch in reference
	from the function mem_reserve()
	to the function .meminit.text:memblock_reserve()
FATAL: modpost: Section mismatches detected.

This was not seen prior to the above mentioned commit because mem_reserve()
was always inlined.

Mark mem_reserve(() as __init_memblock to have it reside in the same
section as memblock_reserve().
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Message-Id: <1559220098-9955-1-git-send-email-linux@roeck-us.net>
Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ae0d1c08

MIPS: uprobes: remove set but not used variable 'epc' · 3089c0ea

由 YueHaibing 提交于 5月 25, 2019

[ Upstream commit f532beeeff0c0a3586cc15538bc52d249eb19e7c ]

Fixes gcc '-Wunused-but-set-variable' warning:

arch/mips/kernel/uprobes.c: In function 'arch_uprobe_pre_xol':
arch/mips/kernel/uprobes.c:115:17: warning: variable 'epc' set but not used [-Wunused-but-set-variable]

It's never used since introduction in
commit 40e084a5 ("MIPS: Add uprobes support.")
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Cc: <ralf@linux-mips.org>
Cc: <jhogan@kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <linux-mips@vger.kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3089c0ea

parisc: Fix compiler warnings in float emulation code · 3333e040

由 Helge Deller 提交于 5月 24, 2019

[ Upstream commit 6b98d9134e14f5ef4bcf64b27eedf484ed19a1ec ]

Avoid such compiler warnings:
arch/parisc/math-emu/cnv_float.h:71:27: warning: ‘<<’ in boolean context, did you mean ‘<’ ? [-Wint-in-bool-context]
     ((Dintp1(dint_valueA) << 33 - SGL_EXP_LENGTH) || Dintp2(dint_valueB))
arch/parisc/math-emu/fcnvxf.c:257:6: note: in expansion of macro ‘Dint_isinexact_to_sgl’
  if (Dint_isinexact_to_sgl(srcp1,srcp2)) {
Signed-off-by: NHelge Deller <deller@gmx.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3333e040

ARC: [plat-hsdk]: Add missing FIFO size entry in GMAC node · 7b2145e2

由 Jose Abreu 提交于 5月 20, 2019

[ Upstream commit 4c70850aeb2e40016722cd1abd43c679666d3ca0 ]

Add the binding for RX/TX fifo size of GMAC node.

Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Tested-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Acked-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NJose Abreu <joabreu@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7b2145e2

ARC: [plat-hsdk]: Add missing multicast filter bins number to GMAC node · 15004afd

由 Jose Abreu 提交于 5月 20, 2019

[ Upstream commit ecc906a11c2a0940e1a380debd8bd5bc09faf454 ]

GMAC controller on HSDK boards supports 256 Hash Table size so we need to
add the multicast filter bins property. This allows for the Hash filter
to work properly using stmmac driver.

Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Acked-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NJose Abreu <joabreu@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

15004afd

ARC: fix build warnings · 4c21b761

由 Vineet Gupta 提交于 5月 07, 2019

[ Upstream commit 89c92142f75eb80064f5b9f1111484b1b4d81790 ]

| arch/arc/mm/tlb.c:914:2: warning: variable length array 'pd0' is used [-Wvla]
| arch/arc/include/asm/cmpxchg.h:95:29: warning: value computed is not used [-Wunused-value]
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4c21b761

s390/ap: rework assembler functions to use unions for in/out register variables · 4c15ded5

由 Harald Freudenberger 提交于 11月 16, 2018

[ Upstream commit 159491f3b509bd8101199944dc7b0673b881c734 ]

The inline assembler functions ap_aqic() and ap_qact() used two
variables declared on the very same register. One variable was for
input only, the other for output. Looks like newer versions of the gcc
don't like this. Anyway it is a better coding to use one variable
(which may have a union data type) on one register for input and
output. So this patch introduces unions and uses only one variable now
for input and output for GR1 for the PQAP(QACT) and PQAP(QIC)
invocation.
Signed-off-by: NHarald Freudenberger <freude@linux.ibm.com>
Acked-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4c15ded5

s390/jump_label: Use "jdd" constraint on gcc9 · fb48fb15

由 Ilya Leoshkevich 提交于 6月 21, 2019

[ Upstream commit 146448524bddbf6dfc62de31957e428de001cbda ]

[heiko.carstens@de.ibm.com]:
-----
Laura Abbott reported that the kernel doesn't build anymore with gcc 9,
due to the "X" constraint. Ilya provided the gcc 9 patch "S/390:
Introduce jdd constraint" which introduces the new "jdd" constraint
which fixes this.
-----

The support for section anchors on S/390 introduced in gcc9 has changed
the behavior of "X" constraint, which can now produce register
references. Since existing constraints, in particular, "i", do not fit
the intended use case on S/390, the new machine-specific "jdd"
constraint was introduced. This patch makes jump labels use "jdd"
constraint when building with gcc9.
Reported-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fb48fb15

22 6月, 2019 3 次提交

arm64: use the correct function type for __arm64_sys_ni_syscall · 467f9026

由 Sami Tolvanen 提交于 5月 24, 2019

[ Upstream commit 1e29ab3186e33c77dbb2d7566172a205b59fa390 ]

Calling sys_ni_syscall through a syscall_fn_t pointer trips indirect
call Control-Flow Integrity checking due to a function type
mismatch. Use SYSCALL_DEFINE0 for __arm64_sys_ni_syscall instead and
remove the now unnecessary casts.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

467f9026

arm64: use the correct function type in SYSCALL_DEFINE0 · 98fd62e0

由 Sami Tolvanen 提交于 5月 24, 2019

[ Upstream commit 0e358bd7b7ebd27e491dabed938eae254c17fe3b ]

Although a syscall defined using SYSCALL_DEFINE0 doesn't accept
parameters, use the correct function type to avoid indirect call
type mismatches with Control-Flow Integrity checking.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

98fd62e0

arm64: fix syscall_fn_t type · c5fdfaed

由 Sami Tolvanen 提交于 5月 24, 2019

[ Upstream commit 8ef8f368ce72b5e17f7c1f1ef15c38dcfd0fef64 ]

Syscall wrappers in <asm/syscall_wrapper.h> use const struct pt_regs *
as the argument type. Use const in syscall_fn_t as well to fix indirect
call type mismatches with Control-Flow Integrity checking.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c5fdfaed

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功