提交 · c611990844c28c61ca4b35ff69d3a2ae95ccd486 · openeuler / Kernel

31 1月, 2020 1 次提交

KVM: s390: ENOTSUPP -> EOPNOTSUPP fixups · c6119908

由 Christian Borntraeger 提交于 12月 05, 2019

There is no ENOTSUPP for userspace.
Reported-by: NJulian Wiedmann <jwi@linux.ibm.com>
Fixes: 51978393 ("KVM: s390: introduce ais mode modify function")
Fixes: 2c1a48f2 ("KVM: S390: add new group for flic")
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c6119908

12 12月, 2019 7 次提交

s390/kasan: add KASAN_VMALLOC support · 3e39ce26

由 Vasily Gorbik 提交于 8月 02, 2019

Add KASAN_VMALLOC support which now enables vmalloc memory area access
checks as well as enables usage of VMAP_STACK under kasan.

KASAN_VMALLOC changes the way vmalloc and modules areas shadow memory
is handled. With this new approach only top level page tables are
pre-populated and lower levels are filled dynamically upon memory
allocation.
Acked-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

3e39ce26

s390: remove last diag 0x44 caller · 1b68ac86

由 Heiko Carstens 提交于 11月 29, 2019

diag 0x44 is a voluntary undirected yield of a virtual CPU. This has
caused a lot of performance issues in the past.

There is only one caller left, and that one is only executed if diag
0x9c (directed yield) is not present. Given that all hypervisors
implement diag 0x9c anyway, remove the last diag 0x44 to avoid that
more callers will be added.

Worst case that could happen now, if diag 0x9c is not present, is that
a virtual CPU would loop a bit instead of giving its time slice up.

diag 0x44 statistics in debugfs are kept and will always be zero, so
that user space can tell that there are no calls.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

1b68ac86

s390/uv: use EOPNOTSUPP instead of ENOTSUPP · 157309a9

由 Christian Borntraeger 提交于 12月 06, 2019

ENOTSUP is just an internal kernel error and should never reach
userspace. The return value of the share function is not exported to
userspace, but to avoid giving bad examples let us use EOPNOTSUPP:
Suggested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NJanosch Frank <frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

157309a9

s390/cpum_sf: Avoid SBD overflow condition in irq handler · 0539ad0b

由 Thomas Richter 提交于 11月 29, 2019

The s390 CPU Measurement sampling facility has an overflow condition
which fires when all entries in a SBD are used.
The measurement alert interrupt is triggered and reads out all samples
in this SDB. It then tests the successor SDB, if this SBD is not full,
the interrupt handler does not read any samples at all from this SDB
The design waits for the hardware to fill this SBD and then trigger
another meassurement alert interrupt.

This scheme works nicely until
an perf_event_overflow() function call discards the sample due to
a too high sampling rate.
The interrupt handler has logic to read out a partially filled SDB
when the perf event overflow condition in linux common code is met.
This causes the CPUM sampling measurement hardware and the PMU
device driver to operate on the same SBD's trailer entry.
This should not happen.

This can be seen here using this trace:
   cpumsf_pmu_add: tear:0xb5286000
   hw_perf_event_update: sdbt 0xb5286000 full 1 over 0 flush_all:0
   hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0
        above shows 1. interrupt
   hw_perf_event_update: sdbt 0xb5286008 full 1 over 0 flush_all:0
   hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0
        above shows 2. interrupt
	... this goes on fine until...
   hw_perf_event_update: sdbt 0xb5286068 full 1 over 0 flush_all:0
   perf_push_sample1: overflow
      one or more samples read from the IRQ handler are rejected by
      perf_event_overflow() and the IRQ handler advances to the next SDB
      and modifies the trailer entry of a partially filled SDB.
   hw_perf_event_update: sdbt 0xb5286070 full 0 over 0 flush_all:1
      timestamp: 14:32:52.519953

Next time the IRQ handler is called for this SDB the trailer entry shows
an overflow count of 19 missed entries.
   hw_perf_event_update: sdbt 0xb5286070 full 1 over 19 flush_all:1
      timestamp: 14:32:52.970058

Remove access to a follow on SDB when event overflow happened.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

0539ad0b

s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits · 39d4a501

由 Thomas Richter 提交于 11月 28, 2019

Function perf_event_ever_overflow() and perf_event_account_interrupt()
are called every time samples are processed by the interrupt handler.
However function perf_event_account_interrupt() has checks to avoid being
flooded with interrupts (more then 1000 samples are received per
task_tick).  Samples are then dropped and a PERF_RECORD_THROTTLED is
added to the perf data. The perf subsystem limit calculation is:

    maximum sample frequency := 100000 --> 1 samples per 10 us
    task_tick = 10ms = 10000us --> 1000 samples per task_tick

The work flow is

measurement_alert() uses SDBT head and each SBDT points to 511
 SDB pages, each with 126 sample entries. After processing 8 SBDs
 and for each valid sample calling:

     perf_event_overflow()
       perf_event_account_interrupts()

there is a considerable amount of samples being dropped, especially when
the sample frequency is very high and near the 100000 limit.

To avoid the high amount of samples being dropped near the end of a
task_tick time frame, increment the sampling interval in case of
dropped events. The CPU Measurement sampling facility on the s390
supports only intervals, specifiing how many CPU cycles have to be
executed before a sample is generated. Increase the interval when the
samples being generated hit the task_tick limit.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

39d4a501

s390/test_unwind: fix spelling mistake "reqister" -> "register" · 7e914fd1

由 Colin Ian King 提交于 12月 02, 2019

There is a spelling mistake in a pr_info message. Fix it.

Link: https://lkml.kernel.org/r/20191202090215.28766-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7e914fd1

V
s390/spinlock: remove confusing comment in arch_spin_lock_wait · b62b6cf1
由 Vasily Gorbik 提交于 12月 03, 2019
```
arch_spin_lock_wait does not take steal time into consideration.
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
```
b62b6cf1

05 12月, 2019 1 次提交

arch: ipcbuf.h: make uapi asm/ipcbuf.h self-contained · 5b009673

由 Masahiro Yamada 提交于 12月 04, 2019

Userspace cannot compile <asm/ipcbuf.h> due to some missing type
definitions.  For example, building it for x86 fails as follows:

    CC      usr/include/asm/ipcbuf.h.s
  In file included from usr/include/asm/ipcbuf.h:1:0,
                   from <command-line>:32:
  usr/include/asm-generic/ipcbuf.h:21:2: error: unknown type name `__kernel_key_t'
    __kernel_key_t  key;
    ^~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:22:2: error: unknown type name `__kernel_uid32_t'
    __kernel_uid32_t uid;
    ^~~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:23:2: error: unknown type name `__kernel_gid32_t'
    __kernel_gid32_t gid;
    ^~~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:24:2: error: unknown type name `__kernel_uid32_t'
    __kernel_uid32_t cuid;
    ^~~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:25:2: error: unknown type name `__kernel_gid32_t'
    __kernel_gid32_t cgid;
    ^~~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:26:2: error: unknown type name `__kernel_mode_t'
    __kernel_mode_t  mode;
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:28:35: error: `__kernel_mode_t' undeclared here (not in a function)
    unsigned char  __pad1[4 - sizeof(__kernel_mode_t)];
                                     ^~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:31:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused1;
    ^~~~~~~~~~~~~~~~
  usr/include/asm-generic/ipcbuf.h:32:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused2;
    ^~~~~~~~~~~~~~~~

It is just a matter of missing include directive.

Include <linux/posix_types.h> to make it self-contained, and add it to
the compile-test coverage.

Link: http://lkml.kernel.org/r/20191030063855.9989-1-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b009673

01 12月, 2019 1 次提交

s390: remove compat vdso code · 2115fbf7

由 Heiko Carstens 提交于 11月 18, 2019

Remove compat vdso code, since there is hardly any compat user space
left. Still existing compat user space will have to use system calls
instead.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

2115fbf7

30 11月, 2019 30 次提交

s390/livepatch: Implement reliable stack tracing for the consistency model · aa137a6d

由 Miroslav Benes 提交于 11月 06, 2019

The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The "zSeries ELF Application Binary Interface Supplement" says:

"The stack pointer points to the first word of the lowest allocated
stack frame. If the "back chain" is implemented this word will point to
the previously allocated stack frame (towards higher addresses), except
for the first stack frame, which shall have a back chain of zero (NULL).
The stack shall grow downwards, in other words towards lower addresses."

"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:

"The compiler is called with the -mbackchain option, all normal C
function will store the backchain in the function prologue. All
functions written in assembler code should do the same, if you find one
that does not we should fix that. The end result is that a task that
*voluntarily* called schedule() should have a proper backchain at all
times.

Dependent on the use case this may or may not be enough. Asynchronous
interrupts may stop the CPU at the beginning of a function, if kernel
preemption is enabled we can end up with a broken backchain. The
production kernels for IBM Z are all compiled *without* kernel
preemption. So yes, we might get away without the objtool support.

On a side-note, we do have a line item to implement the ORC unwinder for
the kernel, that includes the objtool support. Once we have that we can
drop the -mbackchain option for the kernel build. That gives us a nice
little performance benefit. I hope that the change from backchain to the
ORC unwinder will not be too hard to implement in the livepatch tools."

Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.

Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.

Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.

[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]

Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.czReviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

aa137a6d

s390/unwind: add stack pointer alignment sanity checks · be2d11b2

由 Miroslav Benes 提交于 11月 27, 2019

ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.

be2d11b2

s390/unwind: filter out unreliable bogus %r14 · bf018ee6

由 Vasily Gorbik 提交于 11月 27, 2019

Currently unwinder unconditionally returns %r14 from the first frame
pointed by %r15 from pt_regs. A task could be interrupted when a function
already allocated this frame (if it needs it) for its callees or to
store local variables. In that case this frame would contain random
values from stack or values stored there by a callee. As we are only
interested in %r14 to get potential return address, skip bogus return
addresses which doesn't belong to kernel text.

This helps to avoid duplicating filtering logic in unwider users, most
of which use unwind_get_return_address() and would choke on bogus 0
address returned by it otherwise.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

bf018ee6

s390/unwind: start unwinding from reliable state · 222ee908

由 Vasily Gorbik 提交于 11月 27, 2019

A comment in arch/s390/include/asm/unwind.h says:
> If 'first_frame' is not zero unwind_start skips unwind frames until it
> reaches the specified stack pointer.
> The end of the unwinding is indicated with unwind_done, this can be true
> right after unwind_start, e.g. with first_frame!=0 that can not be found.
> unwind_next_frame skips to the next frame.
> Once the unwind is completed unwind_error() can be used to check if there
> has been a situation where the unwinder could not correctly understand
> the tasks call chain.

With this change backchain unwinder now comply with behaviour
described. As well as matches orc unwinder implementation.  Now unwinder
starts from reliable state, i.e. __unwind_start own stack frame is
taken or stack frame generated by __switch_to (ksp) - both known to be
valid. In case of pt_regs %r15 is better match for pt_regs psw, than
sometimes random "sp" caller passed.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

222ee908

s390/test_unwind: add program check context tests · de6921cc

由 Vasily Gorbik 提交于 11月 25, 2019

Add unwinding from program check handler tests. Unwinder should be able
to unwind through pt_regs stored by program check handler on task stack.
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

de6921cc

s390/test_unwind: add irq context tests · e7409367

由 Vasily Gorbik 提交于 11月 22, 2019

Add unwinding from irq context tests. Unwinder should be able to unwind
through irq stack to task stack up to task pt_regs.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

e7409367

s390/test_unwind: print verbose unwinding results · 06101546

由 Vasily Gorbik 提交于 11月 22, 2019

Add stack name, sp and reliable information into test unwinding
results. Also consider ip outside of kernel text as failure if the
state is reported reliable.
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

06101546

s390/test_unwind: add CALL_ON_STACK tests · 7868249f

由 Vasily Gorbik 提交于 11月 22, 2019

Add CALL_ON_STACK helper testing. Tests make sure that we can unwind from
switched stack to original one up to task pt_regs (nodat -> task stack).

UWM_SWITCH_STACK could not be used together with UWM_THREAD because
get_stack_info explicitly restricts unwinding to task stack if
task != current.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7868249f

s390: fix register clobbering in CALL_ON_STACK · 4ac24c09

由 Vasily Gorbik 提交于 11月 25, 2019

CALL_ON_STACK defines and initializes register variables. Inline
assembly which follows might trigger compiler to generate memory access
for "stack" argument (e.g. in case of S390_lowcore.nodat_stack). This
memory access produces a function call under kasan with outline
instrumentation which clobbers registers.

Switch "stack" argument in CALL_ON_STACK helper to use memory reference
constraint and perform load instead.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

4ac24c09

s390/test_unwind: require that unwinding ended successfully · f44fa79b

由 Vasily Gorbik 提交于 11月 22, 2019

Currently unwinder test passes if unwinding results contain unwindme_func2
and unwindme_func1 functions.
Now that unwinder reports success upon reaching task pt_regs, check
that unwinding ended successfully in every test.
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

f44fa79b

s390/unwind: add a test for the internal API · badbf397

由 Ilya Leoshkevich 提交于 10月 17, 2019

unwind_for_each_frame can take at least 8 different sets of parameters.
Add a test to make sure they all are handled in a sane way.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Co-developed-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

badbf397

s390/unwind: always inline get_stack_pointer · adcfb8cd

由 Vasily Gorbik 提交于 11月 26, 2019

Always inline get_stack_pointer() to avoid potential problems
due to compiler inlining decisions, i.e. getting stack pointer of
get_stack_pointer() itself which is later reused.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

adcfb8cd

s390/pci: add error message on device number limit · d497b7ec

由 Niklas Schnelle 提交于 11月 28, 2019

The config option CONFIG_PCI_NR_FUNCTIONS sets a limit on the number of
PCI functions we can support. Previously on reaching this limit there
was no indication why newly attached devices are not recognized by Linux
which could be quite confusing. Thus this patch adds a pr_err() for this
case.
Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

d497b7ec

s390/pci: add error message for UID collision · 794b8846

由 Niklas Schnelle 提交于 11月 28, 2019

When UID checking was turned off during runtime in the underlying
hypervisor, a PCI device may be attached with the same UID. This is
already detected but happens silently. Add an error message so it can
more easily be understood why a device was not added.
Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

794b8846

s390/cpum_sf: Check for SDBT and SDB consistency · 247f265f

由 Thomas Richter 提交于 11月 22, 2019

Each SBDT is located at a 4KB page and contains 512 entries.
Each entry of a SDBT points to a SDB, a 4KB page containing
sampled data. The last entry is a link to another SDBT page.

When an event is created the function sequence executed is:

  __hw_perf_event_init()
  +--> allocate_buffers()
       +--> realloc_sampling_buffers()
	    +---> alloc_sample_data_block()

Both functions realloc_sampling_buffers() and
alloc_sample_data_block() allocate pages and the allocation
can fail. This is handled correctly and all allocated
pages are freed and error -ENOMEM is returned to the
top calling function. Finally the event is not created.

Once the event has been created, the amount of initially
allocated SDBT and SDB can be too low. This is detected
during measurement interrupt handling, where the amount
of lost samples is calculated. If the number of lost samples
is too high considering sampling frequency and already allocated
SBDs, the number of SDBs is enlarged during the next execution
of cpumsf_pmu_enable().

If more SBDs need to be allocated, functions

       realloc_sampling_buffers()
       +---> alloc-sample_data_block()

are called to allocate more pages. Page allocation may fail
and the returned error is ignored. A SDBT and SDB setup
already exists.

However the modified SDBTs and SDBs might end up in a situation
where the first entry of an SDBT does not point to an SDB,
but another SDBT, basicly an SBDT without payload.
This can not be handled by the interrupt handler, where an SDBT
must have at least one entry pointing to an SBD.

Add a check to avoid SDBTs with out payload (SDBs) when enlarging
the buffer setup.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

247f265f

s390/cpum_sf: Use TEAR_REG macro consistantly · 7dd6b199

由 Thomas Richter 提交于 11月 22, 2019

The macro TEAR_REG() saves the last used SDBT address
in the perf_hw_event structure. This is also done
by function hw_reset_registers() which is a one-liner
and simply uses macro TEAR_REG(). Remove function
hw_reset_registers(), which is only used one time and use
macro TEAR_REG() instead. This macro is used throughout
the code anyway.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7dd6b199

s390/cpum_sf: Remove unnecessary check for pending SDBs · c17a7c6e

由 Thomas Richter 提交于 11月 22, 2019

In interrupt handling the function extend_sampling_buffer()
is called after checking for a possibly extension.
This check is not necessary as the called function itself
performs this check again.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

c17a7c6e

s390/cpum_sf: Replace function name in debug statements · 532da3de

由 Thomas Richter 提交于 11月 21, 2019

Replace hard coded function names in debug statements
by the "%s ...", __func__ construct suggested by checkpatch.pl
script.  Use consistent debug print format of the form variable
blank value. Also add leading 0x for all hex values.
Print allocated page addresses consistantly as hex numbers
with leading 0x.
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

532da3de

s390/kaslr: store KASLR offset for early dumps · a9f2f686

由 Gerald Schaefer 提交于 11月 19, 2019

The KASLR offset is added to vmcoreinfo in arch_crash_save_vmcoreinfo(),
so that it can be found by crash when processing kernel dumps.

However, arch_crash_save_vmcoreinfo() is called during a subsys_initcall,
so if the kernel crashes before that, we have no vmcoreinfo and no KASLR
offset.

Fix this by storing the KASLR offset in the lowcore, where the vmcore_info
pointer will be stored, and where it can be found by crash. In order to
make it distinguishable from a real vmcore_info pointer, mark it as uneven
(KASLR offset itself is aligned to THREAD_SIZE).

When arch_crash_save_vmcoreinfo() stores the real vmcore_info pointer in
the lowcore, it overwrites the KASLR offset. At that point, the KASLR
offset is not yet added to vmcoreinfo, so we also need to move the
mem_assign_absolute() behind the vmcoreinfo_append_str().

Fixes: b2d24b97 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

a9f2f686

s390/unwind: stop gracefully at task pt_regs · e76e6961

由 Vasily Gorbik 提交于 11月 22, 2019

Consider reaching task pt_regs graceful unwinder termination. Task
pt_regs itself never contains a valid state to which a task might return
within the kernel context (user task pt_regs is a special case). Since
we already avoid printing user task pt_regs and in most cases we don't
even bother filling task pt_regs psw and r15 with something reasonable
simply skip task pt_regs altogether. With this change unwind_error() now
accurately represent whether unwinder reached task pt_regs successfully
or failed along the way.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

e76e6961

s390/head64: correct init_task stack setup · cb7948e8

由 Vasily Gorbik 提交于 11月 22, 2019

Add missing allocation of pt_regs at the bottom of the stack. This
makes it consistent with other stack setup cases and also what stack
unwinder expects.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

cb7948e8

s390/unwind: make reuse_sp default when unwinding pt_regs · 97806dfb

由 Vasily Gorbik 提交于 11月 22, 2019

Currently unwinder yields 2 entries when pt_regs are met:
sp="address of pt_regs itself" ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"

And neither of those 2 states (combination of sp and ip) ever happened.

reuse_sp has been introduced by commit a1d863ac ("s390/unwind: fix
mixing regs and sp"). reuse_sp=true makes unwinder keen to produce the
following result, when pt_regs are given (as an arg to unwind_start):
sp=pt_regs->gprs[15] ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"

The first state is an actual state in which a task was when pt_regs were
collected. The second state is marked unreliable and is for debugging
purposes to cover the case when a task has been interrupted in between
stack frame allocation and writing back_chain - in this case r14 might
show an actual caller.

Make unwinder behaviour enabled via reuse_sp=true default and drop the
special case handling.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

97806dfb

s390/unwind: report an error if pt_regs are not on stack · 67f55934

由 Vasily Gorbik 提交于 11月 22, 2019

If unwinder is looking at pt_regs which is not on stack then something
went wrong and an error has to be reported rather than successful
unwinding termination.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

67f55934

s390: avoid misusing CALL_ON_STACK for task stack setup · 7bcaad1f

由 Vasily Gorbik 提交于 11月 22, 2019

CALL_ON_STACK is intended to be used for temporary stack switching with
potential return to the caller.

When CALL_ON_STACK is misused to switch from nodat stack to task stack
back_chain information would later lead stack unwinder from task stack into
(per cpu) nodat stack which is reused for other purposes. This would
yield confusing unwinding result or errors.

To avoid that introduce CALL_ON_STACK_NORETURN to be used instead. It
makes sure that back_chain is zeroed and unwinder finishes gracefully
ending up at task pt_regs.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7bcaad1f

s390: correct CALL_ON_STACK back_chain saving · 75794257

由 Vasily Gorbik 提交于 11月 22, 2019

Currently CALL_ON_STACK saves r15 as back_chain in the first stack frame of
the stack we about to switch to. But if a function which uses CALL_ON_STACK
calls other function it allocates a stack frame for a callee. In this
case r15 is pointing to a callee stack frame and not a stack frame of
function itself. This results in dummy unwinding entry with random
sp and ip values.

Introduce and utilize current_frame_address macro to get an address of
actual function stack frame.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

75794257

s390/unwind: unify task is current checks · 103b4cca

由 Vasily Gorbik 提交于 11月 22, 2019

Avoid mixture of task == NULL and task == current meaning the same
thing and simply always initialize task with current in unwind_start.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

103b4cca

s390: disable preemption when switching to nodat stack with CALL_ON_STACK · 7f28dad3

由 Vasily Gorbik 提交于 11月 22, 2019

Make sure preemption is disabled when temporary switching to nodat
stack with CALL_ON_STACK helper, because nodat stack is per cpu.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

7f28dad3

s390: always inline disabled_wait · c2e06e15

由 Vasily Gorbik 提交于 11月 22, 2019

disabled_wait uses _THIS_IP_ and assumes that compiler would inline it.
Make sure this assumption is always correct by utilizing __always_inline.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

c2e06e15

s390/vdso: fix getcpu · 5a5525b0

由 Heiko Carstens 提交于 11月 18, 2019

getcpu reads the required values for cpu and node with two
instructions. This might lead to an inconsistent result if user space
gets preempted and migrated to a different CPU between the two
instructions.

Fix this by using just a single instruction to read both values at
once.

This is currently rather a theoretical bug, since there is no real
NUMA support available (except for NUMA emulation).
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

5a5525b0

s390/smp,vdso: fix ASCE handling · a2308c11

由 Heiko Carstens 提交于 11月 18, 2019

When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.

This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.

Typically:
- the primary ASCE will contain the user process ASCE of the process
  that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.

Due to lazy ASCE handling we may also end up with other combinations.

When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.

Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.

Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.

Fixes: 0aaba41b ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

a2308c11

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功