提交 · 222ee9087a730b1df08d09baed0d03626e67600f · openeuler / Kernel

30 11月, 2019 28 次提交

s390/unwind: start unwinding from reliable state · 222ee908

由 Vasily Gorbik 提交于 5年前

A comment in arch/s390/include/asm/unwind.h says:
> If 'first_frame' is not zero unwind_start skips unwind frames until it
> reaches the specified stack pointer.
> The end of the unwinding is indicated with unwind_done, this can be true
> right after unwind_start, e.g. with first_frame!=0 that can not be found.
> unwind_next_frame skips to the next frame.
> Once the unwind is completed unwind_error() can be used to check if there
> has been a situation where the unwinder could not correctly understand
> the tasks call chain.

With this change backchain unwinder now comply with behaviour
described. As well as matches orc unwinder implementation.  Now unwinder
starts from reliable state, i.e. __unwind_start own stack frame is
taken or stack frame generated by __switch_to (ksp) - both known to be
valid. In case of pt_regs %r15 is better match for pt_regs psw, than
sometimes random "sp" caller passed.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

222ee908

s390/test_unwind: add program check context tests · de6921cc

由 Vasily Gorbik 提交于 5年前

Add unwinding from program check handler tests. Unwinder should be able
to unwind through pt_regs stored by program check handler on task stack.
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

de6921cc

s390/test_unwind: add irq context tests · e7409367

由 Vasily Gorbik 提交于 5年前

Add unwinding from irq context tests. Unwinder should be able to unwind
through irq stack to task stack up to task pt_regs.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

e7409367

s390/test_unwind: print verbose unwinding results · 06101546

由 Vasily Gorbik 提交于 5年前

Add stack name, sp and reliable information into test unwinding
results. Also consider ip outside of kernel text as failure if the
state is reported reliable.
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

06101546

s390/test_unwind: add CALL_ON_STACK tests · 7868249f

由 Vasily Gorbik 提交于 5年前

Add CALL_ON_STACK helper testing. Tests make sure that we can unwind from
switched stack to original one up to task pt_regs (nodat -> task stack).

UWM_SWITCH_STACK could not be used together with UWM_THREAD because
get_stack_info explicitly restricts unwinding to task stack if
task != current.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7868249f

s390: fix register clobbering in CALL_ON_STACK · 4ac24c09

由 Vasily Gorbik 提交于 5年前

CALL_ON_STACK defines and initializes register variables. Inline
assembly which follows might trigger compiler to generate memory access
for "stack" argument (e.g. in case of S390_lowcore.nodat_stack). This
memory access produces a function call under kasan with outline
instrumentation which clobbers registers.

Switch "stack" argument in CALL_ON_STACK helper to use memory reference
constraint and perform load instead.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

4ac24c09

s390/test_unwind: require that unwinding ended successfully · f44fa79b

由 Vasily Gorbik 提交于 5年前

Currently unwinder test passes if unwinding results contain unwindme_func2
and unwindme_func1 functions.
Now that unwinder reports success upon reaching task pt_regs, check
that unwinding ended successfully in every test.
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

f44fa79b

s390/unwind: add a test for the internal API · badbf397

由 Ilya Leoshkevich 提交于 5年前

unwind_for_each_frame can take at least 8 different sets of parameters.
Add a test to make sure they all are handled in a sane way.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Co-developed-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

badbf397

s390/unwind: always inline get_stack_pointer · adcfb8cd

由 Vasily Gorbik 提交于 5年前

Always inline get_stack_pointer() to avoid potential problems
due to compiler inlining decisions, i.e. getting stack pointer of
get_stack_pointer() itself which is later reused.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

adcfb8cd

s390/pci: add error message on device number limit · d497b7ec

由 Niklas Schnelle 提交于 5年前

The config option CONFIG_PCI_NR_FUNCTIONS sets a limit on the number of
PCI functions we can support. Previously on reaching this limit there
was no indication why newly attached devices are not recognized by Linux
which could be quite confusing. Thus this patch adds a pr_err() for this
case.
Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

d497b7ec

s390/pci: add error message for UID collision · 794b8846

由 Niklas Schnelle 提交于 5年前

When UID checking was turned off during runtime in the underlying
hypervisor, a PCI device may be attached with the same UID. This is
already detected but happens silently. Add an error message so it can
more easily be understood why a device was not added.
Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

794b8846

s390/cpum_sf: Check for SDBT and SDB consistency · 247f265f

由 Thomas Richter 提交于 5年前

Each SBDT is located at a 4KB page and contains 512 entries.
Each entry of a SDBT points to a SDB, a 4KB page containing
sampled data. The last entry is a link to another SDBT page.

When an event is created the function sequence executed is:

  __hw_perf_event_init()
  +--> allocate_buffers()
       +--> realloc_sampling_buffers()
	    +---> alloc_sample_data_block()

Both functions realloc_sampling_buffers() and
alloc_sample_data_block() allocate pages and the allocation
can fail. This is handled correctly and all allocated
pages are freed and error -ENOMEM is returned to the
top calling function. Finally the event is not created.

Once the event has been created, the amount of initially
allocated SDBT and SDB can be too low. This is detected
during measurement interrupt handling, where the amount
of lost samples is calculated. If the number of lost samples
is too high considering sampling frequency and already allocated
SBDs, the number of SDBs is enlarged during the next execution
of cpumsf_pmu_enable().

If more SBDs need to be allocated, functions

       realloc_sampling_buffers()
       +---> alloc-sample_data_block()

are called to allocate more pages. Page allocation may fail
and the returned error is ignored. A SDBT and SDB setup
already exists.

However the modified SDBTs and SDBs might end up in a situation
where the first entry of an SDBT does not point to an SDB,
but another SDBT, basicly an SBDT without payload.
This can not be handled by the interrupt handler, where an SDBT
must have at least one entry pointing to an SBD.

Add a check to avoid SDBTs with out payload (SDBs) when enlarging
the buffer setup.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

247f265f

s390/cpum_sf: Use TEAR_REG macro consistantly · 7dd6b199

由 Thomas Richter 提交于 5年前

The macro TEAR_REG() saves the last used SDBT address
in the perf_hw_event structure. This is also done
by function hw_reset_registers() which is a one-liner
and simply uses macro TEAR_REG(). Remove function
hw_reset_registers(), which is only used one time and use
macro TEAR_REG() instead. This macro is used throughout
the code anyway.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7dd6b199

s390/cpum_sf: Remove unnecessary check for pending SDBs · c17a7c6e

由 Thomas Richter 提交于 5年前

In interrupt handling the function extend_sampling_buffer()
is called after checking for a possibly extension.
This check is not necessary as the called function itself
performs this check again.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

c17a7c6e

s390/cpum_sf: Replace function name in debug statements · 532da3de

由 Thomas Richter 提交于 5年前

Replace hard coded function names in debug statements
by the "%s ...", __func__ construct suggested by checkpatch.pl
script.  Use consistent debug print format of the form variable
blank value. Also add leading 0x for all hex values.
Print allocated page addresses consistantly as hex numbers
with leading 0x.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

532da3de

s390/kaslr: store KASLR offset for early dumps · a9f2f686

由 Gerald Schaefer 提交于 5年前

The KASLR offset is added to vmcoreinfo in arch_crash_save_vmcoreinfo(),
so that it can be found by crash when processing kernel dumps.

However, arch_crash_save_vmcoreinfo() is called during a subsys_initcall,
so if the kernel crashes before that, we have no vmcoreinfo and no KASLR
offset.

Fix this by storing the KASLR offset in the lowcore, where the vmcore_info
pointer will be stored, and where it can be found by crash. In order to
make it distinguishable from a real vmcore_info pointer, mark it as uneven
(KASLR offset itself is aligned to THREAD_SIZE).

When arch_crash_save_vmcoreinfo() stores the real vmcore_info pointer in
the lowcore, it overwrites the KASLR offset. At that point, the KASLR
offset is not yet added to vmcoreinfo, so we also need to move the
mem_assign_absolute() behind the vmcoreinfo_append_str().

Fixes: b2d24b97 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

a9f2f686

s390/unwind: stop gracefully at task pt_regs · e76e6961

由 Vasily Gorbik 提交于 5年前

Consider reaching task pt_regs graceful unwinder termination. Task
pt_regs itself never contains a valid state to which a task might return
within the kernel context (user task pt_regs is a special case). Since
we already avoid printing user task pt_regs and in most cases we don't
even bother filling task pt_regs psw and r15 with something reasonable
simply skip task pt_regs altogether. With this change unwind_error() now
accurately represent whether unwinder reached task pt_regs successfully
or failed along the way.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

e76e6961

s390/head64: correct init_task stack setup · cb7948e8

由 Vasily Gorbik 提交于 5年前

Add missing allocation of pt_regs at the bottom of the stack. This
makes it consistent with other stack setup cases and also what stack
unwinder expects.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

cb7948e8

s390/unwind: make reuse_sp default when unwinding pt_regs · 97806dfb

由 Vasily Gorbik 提交于 5年前

Currently unwinder yields 2 entries when pt_regs are met:
sp="address of pt_regs itself" ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"

And neither of those 2 states (combination of sp and ip) ever happened.

reuse_sp has been introduced by commit a1d863ac ("s390/unwind: fix
mixing regs and sp"). reuse_sp=true makes unwinder keen to produce the
following result, when pt_regs are given (as an arg to unwind_start):
sp=pt_regs->gprs[15] ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"

The first state is an actual state in which a task was when pt_regs were
collected. The second state is marked unreliable and is for debugging
purposes to cover the case when a task has been interrupted in between
stack frame allocation and writing back_chain - in this case r14 might
show an actual caller.

Make unwinder behaviour enabled via reuse_sp=true default and drop the
special case handling.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

97806dfb

s390/unwind: report an error if pt_regs are not on stack · 67f55934

由 Vasily Gorbik 提交于 5年前

If unwinder is looking at pt_regs which is not on stack then something
went wrong and an error has to be reported rather than successful
unwinding termination.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

67f55934

s390: avoid misusing CALL_ON_STACK for task stack setup · 7bcaad1f

由 Vasily Gorbik 提交于 5年前

CALL_ON_STACK is intended to be used for temporary stack switching with
potential return to the caller.

When CALL_ON_STACK is misused to switch from nodat stack to task stack
back_chain information would later lead stack unwinder from task stack into
(per cpu) nodat stack which is reused for other purposes. This would
yield confusing unwinding result or errors.

To avoid that introduce CALL_ON_STACK_NORETURN to be used instead. It
makes sure that back_chain is zeroed and unwinder finishes gracefully
ending up at task pt_regs.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7bcaad1f

s390: correct CALL_ON_STACK back_chain saving · 75794257

由 Vasily Gorbik 提交于 5年前

Currently CALL_ON_STACK saves r15 as back_chain in the first stack frame of
the stack we about to switch to. But if a function which uses CALL_ON_STACK
calls other function it allocates a stack frame for a callee. In this
case r15 is pointing to a callee stack frame and not a stack frame of
function itself. This results in dummy unwinding entry with random
sp and ip values.

Introduce and utilize current_frame_address macro to get an address of
actual function stack frame.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

75794257

s390/unwind: unify task is current checks · 103b4cca

由 Vasily Gorbik 提交于 5年前

Avoid mixture of task == NULL and task == current meaning the same
thing and simply always initialize task with current in unwind_start.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

103b4cca

s390: disable preemption when switching to nodat stack with CALL_ON_STACK · 7f28dad3

由 Vasily Gorbik 提交于 5年前

Make sure preemption is disabled when temporary switching to nodat
stack with CALL_ON_STACK helper, because nodat stack is per cpu.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7f28dad3

s390: always inline disabled_wait · c2e06e15

由 Vasily Gorbik 提交于 5年前

disabled_wait uses _THIS_IP_ and assumes that compiler would inline it.
Make sure this assumption is always correct by utilizing __always_inline.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

c2e06e15

s390/vdso: fix getcpu · 5a5525b0

由 Heiko Carstens 提交于 5年前

getcpu reads the required values for cpu and node with two
instructions. This might lead to an inconsistent result if user space
gets preempted and migrated to a different CPU between the two
instructions.

Fix this by using just a single instruction to read both values at
once.

This is currently rather a theoretical bug, since there is no real
NUMA support available (except for NUMA emulation).
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

5a5525b0

s390/smp,vdso: fix ASCE handling · a2308c11

由 Heiko Carstens 提交于 5年前

When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.

This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.

Typically:
- the primary ASCE will contain the user process ASCE of the process
  that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.

Due to lazy ASCE handling we may also end up with other combinations.

When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.

Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.

Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.

Fixes: 0aaba41b ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

a2308c11

s390: implement perf_arch_fetch_caller_regs · 914d52e4

由 Ilya Leoshkevich 提交于 5年前

On s390 bpf_get_stack_raw_tp() returns 0 entries for both kernel and
user stacks. While there is no practical unwinding solution for userspace
on s390 at this moment, there certainly is a kernel unwinder. However,
it is not properly integrated with BPF.

In order to start unwinding, bpf_get_stack_raw_tp() obtains the current
kernel register values using perf_fetch_caller_regs(), which is not
implemented for s390. The actual unwinding then happens by passing those
registers to perf_callchain_kernel().

Implement perf_arch_fetch_caller_regs() for s390, where
__builtin_frame_address(0) points to back_chain.
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

914d52e4

21 11月, 2019 3 次提交

arm64: uaccess: Remove uaccess_*_not_uao asm macros · e50be648

由 Pavel Tatashin 提交于 5年前

It is safer and simpler to drop the uaccess assembly macros in favour of
inline C functions. Although this bloats the Image size slightly, it
aligns our user copy routines with '{get,put}_user()' and generally
makes the code a lot easier to reason about.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
[will: tweaked commit message and changed temporary variable names]
Signed-off-by: NWill Deacon <will@kernel.org>

e50be648

arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault · 94bb804e

由 Pavel Tatashin 提交于 5年前

A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.

For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.

For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.

For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.

As Pavel explains:

  | I was able to reproduce memory corruption problem on Broadcom's SoC
  | ARMv8-A like this:
  |
  | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
  | stack is accessed and copied.
  |
  | The test program performed the following on every CPU and forking
  | many processes:
  |
  |	unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
  |				  MAP_SHARED | MAP_ANONYMOUS, -1, 0);
  |	map[0] = getpid();
  |	sched_yield();
  |	if (map[0] != getpid()) {
  |		fprintf(stderr, "Corruption detected!");
  |	}
  |	munmap(map, PAGE_SIZE);
  |
  | From time to time I was getting map[0] to contain pid for a
  | different process.

Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: 338d4f49 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
[will: rewrote commit message]
Signed-off-by: NWill Deacon <will@kernel.org>

94bb804e

s390/cpumf: Adjust registration of s390 PMU device drivers · 6a82e23f

由 Thomas Richter 提交于 5年前

Linux-next commit titled "perf/core: Optimize perf_init_event()"
changed the semantics of PMU device driver registration.
It was done to speed up the lookup/handling of PMU device driver
specific events. It also enforces that only one PMU device
driver will be registered of type PERF_EVENT_RAW.

This change added these line in function perf_pmu_register():

  ...
  +       ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
  +       if (ret < 0)
                goto free_pdc;
  +
  +       WARN_ON(type >= 0 && ret != type);

The warn_on generates a message. We have 3 PMU device drivers,
each registered as type PERF_TYPE_RAW.
The cf_diag device driver (arch/s390/kernel/perf_cpumf_cf_diag.c)
always hits the WARN_ON because it is the second PMU device driver
(after sampling device driver arch/s390/kernel/perf_cpumf_sf.c)
which is registered as type 4 (PERF_TYPE_RAW).
So when the sampling device driver is registered, ret has value 4.
When cf_diag device driver is registered with type 4,
ret has value of 5 and WARN_ON fires.

Adjust the PMU device drivers for s390 to support the new
semantics required by perf_pmu_register().
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

6a82e23f

20 11月, 2019 6 次提交

s390/smp: fix physical to logical CPU map for SMT · 72a81ad9

由 Heiko Carstens 提交于 5年前

If an SMT capable system is not IPL'ed from the first CPU the setup of
the physical to logical CPU mapping is broken: the IPL core gets CPU
number 0, but then the next core gets CPU number 1. Correct would be
that all SMT threads of CPU 0 get the subsequent logical CPU numbers.

This is important since a lot of code (like e.g. the CPU topology
code) assumes that CPU maps are setup like this. If the mapping is
broken the system will not IPL due to broken topology masks:

[    1.716341] BUG: arch topology broken
[    1.716342]      the SMT domain not a subset of the MC domain
[    1.716343] BUG: arch topology broken
[    1.716344]      the MC domain not a subset of the BOOK domain

This scenario can usually not happen since LPARs are always IPL'ed
from CPU 0 and also re-IPL is intiated from CPU 0. However older
kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL
from an old kernel into a new kernel is initiated this may lead to
crash.

Fix this by setting up the physical to logical CPU mapping correctly.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

72a81ad9

s390/early: move access registers setup in C code · c2313594

由 Vasily Gorbik 提交于 5年前

Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

c2313594

s390/head64: remove unnecessary vdso_per_cpu_data setup · b8ce1fa4

由 Vasily Gorbik 提交于 5年前

vdso_per_cpu_data lowcore value is only needed for fully functional
exception handlers, which are activated in setup_lowcore_dat_off. The
same function does init vdso_per_cpu_data via vdso_alloc_boot_cpu.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

b8ce1fa4

s390/early: move control registers setup in C code · c02ee6a1

由 Vasily Gorbik 提交于 5年前

Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

c02ee6a1

s390/kasan: support memcpy_real with TRACE_IRQFLAGS · 13f9bae5

由 Vasily Gorbik 提交于 5年前

Currently if the kernel is built with CONFIG_TRACE_IRQFLAGS and KASAN
and used as crash kernel it crashes itself due to
trace_hardirqs_off/trace_hardirqs_on being called with DAT off. This
happens because trace_hardirqs_off/trace_hardirqs_on are instrumented and
kasan code tries to perform access to shadow memory to validate memory
accesses. Kasan shadow memory is populated with vmemmap, so all accesses
require DAT on.

memcpy_real could be called with DAT on or off (with kasan enabled DAT
is set even before early code is executed).

Make sure that trace_hardirqs_off/trace_hardirqs_on are called with DAT
on and only actual __memcpy_real is called with DAT off.

Also annotate __memcpy_real and _memcpy_real with __no_sanitize_address
to avoid further problems due to switching DAT off.
Reviewed-by: NPhilipp Rudo <prudo@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

13f9bae5

s390/crypto: Fix unsigned variable compared with zero · 0398d4ab

由 YueHaibing 提交于 5年前

s390_crypto_shash_parmsize() return type is int, it
should not be stored in a unsigned variable, which
compared with zero.
Reported-by: NHulk Robot <hulkci@huawei.com>
Fixes: 3c2eb6b7 ("s390/crypto: Support for SHA3 via CPACF (MSA6)")
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NJoerg Schmidbauer <jschmidb@linux.vnet.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

0398d4ab

14 11月, 2019 3 次提交

KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast() · ed69a6cb

由 Sean Christopherson 提交于 5年前

Acquire the per-VM slots_lock when zapping all shadow pages as part of
toggling nx_huge_pages.  The fast zap algorithm relies on exclusivity
(via slots_lock) to identify obsolete vs. valid shadow pages, because it
uses a single bit for its generation number. Holding slots_lock also
obviates the need to acquire a read lock on the VM's srcu.

Failing to take slots_lock when toggling nx_huge_pages allows multiple
instances of kvm_mmu_zap_all_fast() to run concurrently, as the other
user, KVM_SET_USER_MEMORY_REGION, does not take the global kvm_lock.
(kvm_mmu_zap_all_fast() does take kvm->mmu_lock, but it can be
temporarily dropped by kvm_zap_obsolete_pages(), so it is not enough
to enforce exclusivity).

Concurrent fast zap instances causes obsolete shadow pages to be
incorrectly identified as valid due to the single bit generation number
wrapping, which results in stale shadow pages being left in KVM's MMU
and leads to all sorts of undesirable behavior.
The bug is easily confirmed by running with CONFIG_PROVE_LOCKING and
toggling nx_huge_pages via its module param.

Note, until commit 4ae5acbc4936 ("KVM: x86/mmu: Take slots_lock when
using kvm_mmu_zap_all_fast()", 2019-11-13) the fast zap algorithm used
an ulong-sized generation instead of relying on exclusivity for
correctness, but all callers except the recently added set_nx_huge_pages()
needed to hold slots_lock anyways.  Therefore, this patch does not have
to be backported to stable kernels.

Given that toggling nx_huge_pages is by no means a fast path, force it
to conform to the current approach instead of reintroducing the previous
generation count.

Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation", but NOT FOR STABLE)
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed69a6cb

sparc: vdso: fix build error of vdso32 · 53472914

由 Masahiro Yamada 提交于 5年前

Since commit 54b8ae66 ("kbuild: change *FLAGS_<basetarget>.o to
take the path relative to $(obj)"), sparc allmodconfig fails to build
as follows:

  CC      arch/sparc/vdso/vdso32/vclock_gettime.o
unrecognized e_machine 18 arch/sparc/vdso/vdso32/vclock_gettime.o
arch/sparc/vdso/vdso32/vclock_gettime.o: failed

The cause of the breakage is that -pg flag not being dropped.

The vdso32 files are located in the vdso32/ subdirectory, but I missed
to update the Makefile.

I removed the meaningless CFLAGS_REMOVE_vdso-note.o since it is only
effective for C file.

vdso-note.o is compiled from assembly file:

  arch/sparc/vdso/vdso-note.S
  arch/sparc/vdso/vdso32/vdso-note.S

Fixes: 54b8ae66 ("kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)")
Reported-by: NAnatoly Pugachev <matorola@gmail.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: NAnatoly Pugachev <matorola@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>

53472914

arm64: Kconfig: add a choice for endianness · d8e85e14

由 Anders Roxell 提交于 5年前

When building allmodconfig KCONFIG_ALLCONFIG=$(pwd)/arch/arm64/configs/defconfig
CONFIG_CPU_BIG_ENDIAN gets enabled. Which tends not to be what most
people want. Another concern that has come up is that ACPI isn't built
for an allmodconfig kernel today since that also depends on !CPU_BIG_ENDIAN.

Rework so that we introduce a 'choice' and default the choice to
CPU_LITTLE_ENDIAN. That means that when we build an allmodconfig kernel
it will default to CPU_LITTLE_ENDIAN that most people tends to want.
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d8e85e14

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功