提交 · 37e88224c0003822b5309b7cab793064be803a3e · openeuler / Kernel

01 8月, 2020 1 次提交

arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack · 338c11e9

由 Maninder Singh 提交于 7月 31, 2020

IRQ_STACK_SIZE can be made different from THREAD_SIZE,
and as IRQ_STACK_SIZE is used while irq stack allocation,
same define should be used while printing information of irq stack.
Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1596196190-14141-1-git-send-email-maninder1.s@samsung.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

338c11e9

31 7月, 2020 7 次提交

x86: Add support for ZSTD compressed kernel · fb46d057

由 Nick Terrell 提交于 7月 30, 2020

- Add support for zstd compressed kernel

- Define __DISABLE_EXPORTS in Makefile

- Remove __DISABLE_EXPORTS definition from kaslr.c

- Bump the heap size for zstd.

- Update the documentation.

Integrates the ZSTD decompression code to the x86 pre-boot code.

Zstandard requires slightly more memory during the kernel decompression
on x86 (192 KB vs 64 KB), and the memory usage is independent of the
window size.

__DISABLE_EXPORTS is now defined in the Makefile, which covers both
the existing use in kaslr.c, and the use needed by the zstd decompressor
in misc.c.

This patch has been boot tested with both a zstd and gzip compressed
kernel on i386 and x86_64 using buildroot and QEMU.

Additionally, this has been tested in production on x86_64 devices.
We saw a 2 second boot time reduction by switching kernel compression
from xz to zstd.
Signed-off-by: NNick Terrell <terrelln@fb.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-7-nickrterrell@gmail.com

fb46d057

x86: Bump ZO_z_extra_bytes margin for zstd · 0fe4f4ef

由 Nick Terrell 提交于 7月 30, 2020

Bump the ZO_z_extra_bytes margin for zstd.

Zstd needs 3 bytes per 128 KB, and has a 22 byte fixed overhead.
Zstd needs to maintain 128 KB of space at all times, since that is
the maximum block size. See the comments regarding in-place
decompression added in lib/decompress_unzstd.c for details.

The existing code is written so that all the compression algorithms use
the same ZO_z_extra_bytes. It is taken to be the maximum of the growth
rate plus the maximum fixed overhead. The comments just above this diff
state that:
Signed-off-by: NNick Terrell <terrelln@fb.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-6-nickrterrell@gmail.com

0fe4f4ef

crypto: x86/curve25519 - Remove unused carry variables · 054a5540

由 Herbert Xu 提交于 7月 23, 2020

The carry variables are assigned but never used, which upsets
the compiler.  This patch removes them.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: NKarthikeyan Bhargavan <karthik.bhargavan@gmail.com>
Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

054a5540

KVM: SVM: Fix disable pause loop exit/pause filtering capability on SVM · 830f01b0

由 Wanpeng Li 提交于 7月 31, 2020

'Commit 8566ac8b ("KVM: SVM: Implement pause loop exit logic in SVM")'
drops disable pause loop exit/pause filtering capability completely, I
guess it is a merge fault by Radim since disable vmexits capabilities and
pause loop exit for SVM patchsets are merged at the same time. This patch
reintroduces the disable pause loop exit/pause filtering capability support.
Reported-by: NHaiwei Li <lihaiwei@tencent.com>
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Fixes: 8566ac8b ("KVM: SVM: Implement pause loop exit logic in SVM")
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1596165141-28874-3-git-send-email-wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

830f01b0

KVM: LAPIC: Prevent setting the tscdeadline timer if the lapic is hw disabled · d2286ba7

由 Wanpeng Li 提交于 7月 31, 2020

Prevent setting the tscdeadline timer if the lapic is hw disabled.

Fixes: bce87cce (KVM: x86: consolidate different ways to test for in-kernel LAPIC)
Cc: <stable@vger.kernel.org>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1596165141-28874-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d2286ba7

ARM: percpu.h: fix build error · aa54ea90

由 Grygorii Strashko 提交于 7月 30, 2020

Fix build error for the case:
  defined(CONFIG_SMP) && !defined(CONFIG_CPU_V6)

config: keystone_defconfig

  CC      arch/arm/kernel/signal.o
  In file included from ../include/linux/random.h:14,
                    from ../arch/arm/kernel/signal.c:8:
  ../arch/arm/include/asm/percpu.h: In function ‘__my_cpu_offset’:
  ../arch/arm/include/asm/percpu.h:29:34: error: ‘current_stack_pointer’ undeclared (first use in this function); did you mean ‘user_stack_pointer’?
      : "Q" (*(const unsigned long *)current_stack_pointer));
                                     ^~~~~~~~~~~~~~~~~~~~~
                                     user_stack_pointer

Fixes: f227e3ec ("random32: update the net random state on interrupt and activity")
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa54ea90

arm64: csum: Fix handling of bad packets · 05fb3dbd

由 Robin Murphy 提交于 7月 30, 2020

Although iph is expected to point to at least 20 bytes of valid memory,
ihl may be bogus, for example on reception of a corrupt packet. If it
happens to be less than 5, we really don't want to run away and
dereference 16GB worth of memory until it wraps back to exactly zero...

Fixes: 0e455d8e ("arm64: Implement optimised IP checksum helpers")
Reported-by: Nguodeqing <geffrey.guo@huawei.com>
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>

05fb3dbd

30 7月, 2020 6 次提交

arm64: Drop unnecessary include from asm/smp.h · 835d1c3a

由 Marc Zyngier 提交于 7月 30, 2020

asm/pointer_auth.h is not needed anymore in asm/smp.h, as 62a679cb
("arm64: simplify ptrauth initialization") removed the keys from the
secondary_data structure.

This also cures a compilation issue introduced by f227e3ec
("random32: update the net random state on interrupt and activity").

Fixes: 62a679cb ("arm64: simplify ptrauth initialization")
Fixes: f227e3ec ("random32: update the net random state on interrupt and activity")
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>

835d1c3a

arm64/alternatives: move length validation inside the subsection · 966a0acc

由 Sami Tolvanen 提交于 7月 30, 2020

Commit f7b93d42 ("arm64/alternatives: use subsections for replacement
sequences") breaks LLVM's integrated assembler, because due to its
one-pass design, it cannot compute instruction sequence lengths before the
layout for the subsection has been finalized. This change fixes the build
by moving the .org directives inside the subsection, so they are processed
after the subsection layout is known.

Fixes: f7b93d42 ("arm64/alternatives: use subsections for replacement sequences")
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1078
Link: https://lore.kernel.org/r/20200730153701.3892953-1-samitolvanen@google.comSigned-off-by: NWill Deacon <will@kernel.org>

966a0acc

arm64/mm: save memory access in check_and_switch_context() fast switch path · c4885bbb

由 Pingfan Liu 提交于 7月 10, 2020

On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
using the per-cpu offset stored in the tpidr_el1 system register. In
some cases we generate a per-cpu address with a sequence like:

  cpu_ptr = &per_cpu(ptr, smp_processor_id());

Which potentially incurs a cache miss for both `cpu_number` and the
in-memory `__per_cpu_offset` array. This can be written more optimally
as:

  cpu_ptr = this_cpu_ptr(ptr);

Which only needs the offset from tpidr_el1, and does not need to
load from memory.

The following two test cases show a small performance improvement measured
on a 46-cpus qualcomm machine with 5.8.0-rc4 kernel.

Test 1: (about 0.3% improvement)
    #cat b.sh
    make clean && make all -j138
    #perf stat --repeat 10 --null --sync sh b.sh

    - before this patch
     Performance counter stats for 'sh b.sh' (10 runs):

                298.62 +- 1.86 seconds time elapsed  ( +-  0.62% )

    - after this patch
     Performance counter stats for 'sh b.sh' (10 runs):

               297.734 +- 0.954 seconds time elapsed  ( +-  0.32% )

Test 2: (about 1.69% improvement)
     'perf stat -r 10 perf bench sched messaging'
        Then sum the total time of 'sched/messaging' by manual.

    - before this patch
      total 0.707 sec for 10 times
    - after this patch
      totol 0.695 sec for 10 times
Signed-off-by: NPingfan Liu <kernelfans@gmail.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/1594389852-19949-1-git-send-email-kernelfans@gmail.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c4885bbb

arm64: sigcontext.h: delete duplicated word · 1a9ea25d

由 Randy Dunlap 提交于 7月 25, 2020

Drop the repeated word "the".
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-4-rdunlap@infradead.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

1a9ea25d

arm64: ptrace.h: delete duplicated word · c4b5abba

由 Randy Dunlap 提交于 7月 25, 2020

Drop the repeated word "the".
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-3-rdunlap@infradead.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c4b5abba

arm64: pgtable-hwdef.h: delete duplicated words · c4334d57

由 Randy Dunlap 提交于 7月 25, 2020

Drop the repeated words "at" and "the".
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-2-rdunlap@infradead.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c4334d57

29 7月, 2020 3 次提交

x86/i8259: Use printk_deferred() to prevent deadlock · bdd65589

由 Thomas Gleixner 提交于 7月 29, 2020

0day reported a possible circular locking dependency:

Chain exists of:
  &irq_desc_lock_class --> console_owner --> &port_lock_key

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&port_lock_key);
                               lock(console_owner);
                               lock(&port_lock_key);
  lock(&irq_desc_lock_class);

The reason for this is a printk() in the i8259 interrupt chip driver
which is invoked with the irq descriptor lock held, which reverses the
lock operations vs. printk() from arbitrary contexts.

Switch the printk() to printk_deferred() to avoid that.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87365abt2v.fsf@nanos.tec.linutronix.de

bdd65589

locking/atomic: Move ATOMIC_INIT into linux/types.h · 7ca8cf53

由 Herbert Xu 提交于 7月 29, 2020

This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWaiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au

7ca8cf53

arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE · fcd7c9c3

由 Valentin Schneider 提交于 7月 29, 2020

Qian reported that the current setup forgoes the Kconfig dependencies and
results in warnings such as:

  WARNING: unmet direct dependencies detected for SCHED_THERMAL_PRESSURE
    Depends on [n]: SMP [=y] && CPU_FREQ_THERMAL [=n]
    Selected by [y]:
    - ARM64 [=y]

Revert commit

  e17ae7fe ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")

and re-implement it by making the option default to 'y' for arm64 and arm,
which respects Kconfig dependencies (i.e. will remain 'n' if
CPU_FREQ_THERMAL=n).

Fixes: e17ae7fe ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")
Reported-by: NQian Cai <cai@lca.pw>
Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200729135718.1871-1-valentin.schneider@arm.com

fcd7c9c3

28 7月, 2020 5 次提交

perf/x86/rapl: Add Hygon Fam18h RAPL support · d903b6d0

由 Pu Wen 提交于 7月 20, 2020

Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX,
and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon
Dhyana Family 18h support for RAPL.

The output is available via the energy-pkg pseudo event:

  $ perf stat -a -I 1000 --per-socket -e power/energy-pkg/

[ mingo: Tidied up the initializers. ]
Signed-off-by: NPu Wen <puwen@hygon.cn>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200720082205.1307-1-puwen@hygon.cn

d903b6d0

KVM: arm64: Don't inherit exec permission across page-table levels · b757b47a

由 Will Deacon 提交于 7月 23, 2020

If a stage-2 page-table contains an executable, read-only mapping at the
pte level (e.g. due to dirty logging being enabled), a subsequent write
fault to the same page which tries to install a larger block mapping
(e.g. due to dirty logging having been disabled) will erroneously inherit
the exec permission and consequently skip I-cache invalidation for the
rest of the block.

Ensure that exec permission is only inherited by write faults when the
new mapping is of the same size as the existing one. A subsequent
instruction abort will result in I-cache invalidation for the entire
block mapping.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NQuentin Perret <qperret@google.com>
Reviewed-by: NQuentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200723101714.15873-1-will@kernel.org

b757b47a

KVM: arm64: Prevent vcpu_has_ptrauth from generating OOL functions · bf4086b1

由 Marc Zyngier 提交于 7月 22, 2020

So far, vcpu_has_ptrauth() is implemented in terms of system_supports_*_auth()
calls, which are declared "inline". In some specific conditions (clang
and SCS), the "inline" very much turns into an "out of line", which
leads to a fireworks when this predicate is evaluated on a non-VHE
system (right at the beginning of __hyp_handle_ptrauth).

Instead, make sure vcpu_has_ptrauth gets expanded inline by directly
using the cpus_have_final_cap() helpers, which are __always_inline,
generate much better code, and are the only thing that make sense when
running at EL2 on a nVHE system.

Fixes: 29eb5a3c ("KVM: arm64: Handle PtrAuth traps early")
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Reported-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20200722162231.3689767-1-maz@kernel.org

bf4086b1

sh: Fix validation of system call number · 04a8a3d0

由 Michael Karcher 提交于 7月 23, 2020

The slow path for traced system call entries accessed a wrong memory
location to get the number of the maximum allowed system call number.
Renumber the numbered "local" label for the correct location to avoid
collisions with actual local labels.
Signed-off-by: NMichael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Tested-by: NJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Fixes: f3a83088 ("sh: Add a few missing irqflags tracing markers.")
Signed-off-by: NRich Felker <dalias@libc.org>

04a8a3d0

sh/tlb: Fix PGTABLE_LEVELS > 2 · c7bcbc8a

由 Peter Zijlstra 提交于 7月 17, 2020

Geert reported that his SH7722-based Migo-R board failed to boot after
commit:

  c5b27a88 ("sh/tlb: Convert SH to generic mmu_gather")

That commit fell victim to copying the wrong pattern --
__pmd_free_tlb() used to be implemented with pmd_free().

Fixes: c5b27a88 ("sh/tlb: Convert SH to generic mmu_gather")
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NRich Felker <dalias@libc.org>

c7bcbc8a

27 7月, 2020 18 次提交

genirq/affinity: Make affinity setting if activated opt-in · f0c7baca

由 Thomas Gleixner 提交于 7月 24, 2020

John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:

 "It looks like what happens is that because the interrupts are not per-CPU
  in the hardware, armpmu_request_irq() calls irq_force_affinity() while
  the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
  IRQF_NOBALANCING.  

  Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
  irq_setup_affinity() which returns early because IRQF_PERCPU and
  IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."

This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.

Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...

Fixes: baedb87d ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: NJohn Keeping <john@metanate.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de

f0c7baca

locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs · ed004953

由 peterz@infradead.org 提交于 7月 27, 2020

Prior to commit:

  859d069e ("lockdep: Prepare for NMI IRQ state tracking")

IRQ state tracking was disabled in NMIs due to nmi_enter()
doing lockdep_off() -- with the obvious requirement that NMI entry
call nmi_enter() before trace_hardirqs_off().

[ AFAICT, PowerPC and SH violate this order on their NMI entry ]

However, that commit explicitly changed lockdep_hardirqs_*() to ignore
lockdep_off() and breaks every architecture that has irq-tracing in
it's NMI entry that hasn't been fixed up (x86 being the only fixed one
at this point).

The reason for this change is that by ignoring lockdep_off() we can:

  - get rid of 'current->lockdep_recursion' in lockdep_assert_irqs*()
    which was going to to give header-recursion issues with the
    seqlock rework.

  - allow these lockdep_assert_*() macros to function in NMI context.

Restore the previous state of things and allow an architecture to
opt-in to the NMI IRQ tracking support, however instead of relying on
lockdep_off(), rely on in_nmi(), both are part of nmi_enter() and so
over-all entry ordering doesn't need to change.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200727124852.GK119549@hirez.programming.kicks-ass.net

ed004953

KVM: nVMX: check for invalid hdr.vmx.flags · 5e105c88

由 Paolo Bonzini 提交于 7月 27, 2020

hdr.vmx.flags is meant for future extensions to the ABI, rejecting
invalid flags is necessary to avoid broken half-loads of the
nVMX state.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5e105c88

KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATE · 0f02bd0a

由 Paolo Bonzini 提交于 7月 27, 2020

A missing VMCS12 was not causing -EINVAL (it was just read with
copy_from_user, so it is not a security issue, but it is still
wrong).  Test for VMCS12 validity and reject the nested state
if a VMCS12 is required but not present.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f02bd0a

H
s390/vmemmap: coding style updates · 9a996c67
由 Heiko Carstens 提交于 7月 23, 2020
```
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
```
9a996c67

s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections · 2c114df0

由 David Hildenbrand 提交于 7月 22, 2020

Let's avoid memset(PAGE_UNUSED) when adding consecutive sections,
whereby the vmemmap of a single section does not span full PMDs.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-10-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

2c114df0

s390/vmemmap: remember unused sub-pmd ranges · cd5781d6

由 David Hildenbrand 提交于 7月 22, 2020

With a memmap size of 56 bytes or 72 bytes per page, the memmap for a
256 MB section won't span full PMDs. As we populate single sections and
depopulate single sections, the depopulation step would not be able to
free all vmemmap pmds anymore.

Do it similarly to x86, marking the unused memmap ranges in a special way
(pad it with 0xFD).

This allows us to add/remove sections, cleaning up all allocated
vmemmap pages even if the memmap size is not multiple of 16 bytes per page.

A 56 byte memmap can, for example, be created with !CONFIG_MEMCG and
!CONFIG_SLUB.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-9-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

cd5781d6

s390/vmemmap: fallback to PTEs if mapping large PMD fails · f2057b42

由 David Hildenbrand 提交于 7月 22, 2020

Let's fallback to single pages if short on huge pages. No need to stop
memory hotplug.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-8-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

f2057b42

s390/vmem: cleanup empty page tables · b9ff8100

由 David Hildenbrand 提交于 7月 22, 2020

Let's cleanup empty page tables. Consider only page tables that fully
fall into the idendity mapping and the vmemmap range.

As there are no valid accesses to vmem/vmemmap within non-populated ranges,
the single tlb flush at the end should be sufficient.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-7-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

b9ff8100

s390/vmemmap: take the vmem_mutex when populating/freeing · aa18e0e6

由 David Hildenbrand 提交于 7月 22, 2020

Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will
be especially relevant when wanting to cleanup empty page tables that could
be shared by both. Avoid races when removing tables that might be just
about to get reused.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-6-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

aa18e0e6

s390/vmemmap: cleanup when vmemmap_populate() fails · c00f05a9

由 David Hildenbrand 提交于 7月 22, 2020

Cleanup what we partially added in case vmemmap_populate() fails. For
vmem, this is already handled by vmem_add_mapping().

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-5-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

c00f05a9

s390/vmemmap: extend modify_pagetable() to handle vmemmap · 9ec8fa8d

由 David Hildenbrand 提交于 7月 22, 2020

Extend our shiny new modify_pagetable() to handle !direct (vmemmap)
mappings. Convert vmemmap_populate() and implement vmemmap_free().

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-4-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

9ec8fa8d

s390/vmem: consolidate vmem_add_range() and vmem_remove_range() · 3e0d3e40

由 David Hildenbrand 提交于 7月 22, 2020

We want to have only a single pagetable walker and reuse the same
functionality for vmemmap handling. Let's start by consolidating
vmem_add_range() and vmem_remove_range(), converting it into a
recursive implementation.

A recursive implementation makes it easier to expand individual cases
without harming readability. In addition, we minimize traversing the
whole hierarchy over and over again.

One change is that we don't unmap large PMDs/PUDs when not completely
covered by the request, something that should never happen with direct
mappings, unless one would be removing in other granularity than added,
which would be broken already.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-3-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

3e0d3e40

s390/vmem: rename vmem_add_mem() to vmem_add_range() · 8398b226

由 David Hildenbrand 提交于 7月 22, 2020

Let's match the name to vmem_remove_range().

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20200722094558.9828-2-david@redhat.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

8398b226

s390: enable HAVE_FUNCTION_ERROR_INJECTION · 73d6eb48

由 Ilya Leoshkevich 提交于 7月 22, 2020

This kernel feature is required for enabling BPF_KPROBE_OVERRIDE.

Define override_function_with_return() and regs_set_return_value()
functions, and fix compile errors in syscall_wrapper.h.
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

73d6eb48

s390/pci: clarify comment in s390_mmio_read/write · 4631f3ca

由 Niklas Schnelle 提交于 7月 07, 2020

The existing comment was talking about reading in the write part
and vice versa. While we are here make it more clear why restricting
the syscalls to MIO capable devices is okay.
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

4631f3ca

powerpc/64s/hash: Fix hash_preload running with interrupts enabled · 909adfc6

由 Nicholas Piggin 提交于 7月 27, 2020

Commit 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the
caller") removed the local_irq_disable from hash_preload, but it was
required for more than just the page table walk: the hash pte busy bit is
effectively a lock which may be taken in interrupt context, and the local
update flag test must not be preempted before it's used.

This solves apparent lockups with perf interrupting __hash_page_64K. If
get_perf_callchain then also takes a hash fault on the same page while it
is already locked, it will loop forever taking hash faults, which looks like
this:

  cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70]
      pc: c000000000072dc8: hash_page_mm+0x8/0x800
      lr: c00000000000c5a4: do_hash_page+0x24/0x38
      sp: c0002ac1cc69ac70
     msr: 8000000000081033
    current = 0xc0002ac1cc602e00
    paca    = 0xc00000001de1f280   irqmask: 0x03   irq_happened: 0x01
      pid   = 20118, comm = pread2_processe
  Linux version 5.8.0-rc6-00345-g1fad14f18bc6
  49e:mon> t
  [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable)
  --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac
  [link register   ] c000000000335d10 copy_from_user_nofault+0xf0/0x150
  [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable)
  [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0
  [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410
  [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40
  [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360
  [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0
  [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790
  [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0
  [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0
  [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750
  [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510
  [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60
  [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0
  --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160
  [link register   ] c0000000000846f0 __hash_page_64K+0x210/0x540
  [c0002ac1cc69ba50] 0000000000000000 (unreliable)
  [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0
  [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0
  [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60
  [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60
  [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90
  [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c
  --- Exception: 300 (Data Access) at 00007fff8c861fe8
  SP (7ffff6b19660) is in userspace

Fixes: 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the caller")
Reported-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: NAnton Blanchard <anton@ozlabs.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com

909adfc6

x86/ioperm: Initialize pointer bitmap with NULL rather than 0 · 90fc7392

由 Colin Ian King 提交于 7月 21, 2020

The pointer bitmap is being initialized with a plain integer 0,
fix this by initializing it with a NULL instead.

Cleans up sparse warning:
arch/x86/xen/enlighten_pv.c:876:27: warning: Using plain integer
as NULL pointer
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200721100217.407975-1-colin.king@canonical.com

90fc7392

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功