提交 · f9a5d70cfaf3e32308de0abfcc95dafe4e36ea51 · openanolis / cloud-kernel

29 9月, 2017 3 次提交

s390/ccwgroup: tie a ccwgroup driver to its ccw driver · f9a5d70c

由 Julian Wiedmann 提交于 9月 14, 2017

When grouping devices, the ccwgroup core only checks whether all of the
devices are bound to the same ccw_driver. It has no means of checking
if the requesting ccwgroup driver actually supports this device type.
qeth implements its own device matching in qeth_core_probe_device(),
while ctcm and lcs currently have no sanity-checking at all.

Enable ccwgroup drivers to optionally defer the device type checking to
the ccwgroup core, by specifying their supported ccw_driver.
This allows us drop the device type matching from qeth, and improves
the robustness of ctcm and lcs.
Signed-off-by: NJulian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

f9a5d70c

s390/crypto: add s390 platform specific aes gcm support. · bf7fa038

由 Harald Freudenberger 提交于 9月 18, 2017

This patch introduces gcm(aes) support into the aes_s390 kernel module.
Signed-off-by: NPatrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bf7fa038

s390/crypto: add inline assembly for KMA instruction to cpacf.h · eecd49c4

由 Patrick Steuer 提交于 9月 18, 2017

Signed-off-by: NPatrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

eecd49c4

28 9月, 2017 15 次提交

s390/rwlock: introduce rwlock wait queueing · eb3b7b84

由 Martin Schwidefsky 提交于 3月 24, 2017

Like the common queued rwlock code the s390 implementation uses the
queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
the queueing. The encoding of the rwlock_t differs though, the counter
field in the rwlock_t is split into two parts. The upper two bytes hold
the write bit and the write wait counter, the lower two bytes hold the
read counter.

The arch_read_lock operation works exactly like the common qrwlock but
the enqueue operation for a writer follows a diffent logic. After the
failed inline try to get the rwlock in write, the writer first increases
the write wait counter, acquires the wait spin_lock for the queueing,
and then loops until there are no readers and the write bit is zero.
Without the write wait counter a CPU that just released the rwlock
could immediately reacquire the lock in the inline code, bypassing all
outstanding read and write waiters. For s390 this would cause massive
imbalances in favour of writers in case of a contended rwlock.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

eb3b7b84

s390/spinlock: introduce spinlock wait queueing · b96f7d88

由 Martin Schwidefsky 提交于 3月 24, 2017

The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.

The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.

The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:

	lhi	%r0,0			# 0 indicates an empty lock
	l	%r1,0x3a0		# CPU number + 1 from lowcore
	cs	%r0,%r1,<some_lock>	# lock operation
	jnz	call_wait		# on failure call wait function
locked:
	...
call_wait:
	la	%r2,<some_lock>
	brasl	%r14,arch_spin_lock_wait
	j	locked

A spin_unlock is as simple as before:

	lhi	%r0,0
	sth	%r0,2(%r2)		# unlock operation

After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.

To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.

While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b96f7d88

s390/spinlock: use the cpu number +1 as spinlock value · 81533803

由 Martin Schwidefsky 提交于 12月 04, 2016

The queued spinlock code will come out simpler if the encoding of
the CPU that holds the spinlock is (cpu+1) instead of (~cpu).
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

81533803

s390/topology: add detection of dedicated vs shared CPUs · 1887aa07

由 Martin Schwidefsky 提交于 9月 22, 2017

The topology information returned by STSI 15.x.x contains a flag
if the CPUs of a topology-list are dedicated or shared. Make this
information available if the machine provides topology information.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1887aa07

s390/cpumf: remove superfluous nr_cpumask_bits check · 19220999

由 Heiko Carstens 提交于 9月 21, 2017

Paul Burton reported that the nr_cpumask_bits check
within cpumsf_pmu_event_init() is not necessary.

Actually there is already a prior check within
perf_event_alloc(). Therefore remove the check.
Reported-by: NPaul Burton <paul.burton@imgtec.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

19220999

s390/ptrace: add runtime instrumention register get/set · 262832bc

由 Alice Frosi 提交于 9月 14, 2017

Add runtime instrumention register get and set which allows to read
and modify the runtime instrumention control block.
Signed-off-by: NAlice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

262832bc

s390/runtime_instrumentation: clean up struct runtime_instr_cb · bb59c2da

由 Alice Frosi 提交于 9月 14, 2017

Update runtime_instr_cb structure to be consistent with the runtime
instrumentation documentation.
Signed-off-by: NAlice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bb59c2da

s390: add support for FORTIFY_SOURCE · 79962038

由 Heiko Carstens 提交于 9月 12, 2017

This is the quite trivial backend for s390 which is required to enable
FORTIFY_SOURCE support.

See commit 6974f0c4 ("include/linux/string.h: add the option of
fortified string.h functions") for more details.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

79962038

s390: get rid of exit_thread() · 59a19ea9

由 Heiko Carstens 提交于 9月 11, 2017

exit_thread() is empty now. Therefore remove it and get rid of a
pointless branch.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

59a19ea9

s390/guarded storage: simplify task exit handling · 7b83c629

由 Heiko Carstens 提交于 9月 11, 2017

Free data structures required for guarded storage from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.

In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7b83c629

s390/ptrace: fix guarded storage regset handling · 5ef2d523

由 Heiko Carstens 提交于 9月 11, 2017

If the guarded storage regset for current is supposed to be changed,
the regset from user space is copied directly into the guarded storage
control block.

If then the process gets scheduled away while the control block is
being copied and before the new control block has been loaded, the
result is random: the process can be scheduled away due to a page
fault or preemption. If that happens the already copied parts will be
overwritten by save_gs_cb(), called from switch_to().

Avoid this by copying the data to a temporary buffer on the stack and
do the actual update with preemption disabled.

Fixes: f5bbd721 ("s390/ptrace: guarded storage regset for the current task")
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

5ef2d523

s390/guarded storage: fix possible memory corruption · fa1edf3f

由 Heiko Carstens 提交于 9月 11, 2017

For PREEMPT enabled kernels the guarded storage (GS) code contains a
possible use-after-free bug. If a task that makes use of GS exits, it
will execute do_exit() while still enabled for preemption.

That function will call exit_thread_runtime_instr() via exit_thread().
If exit_thread_gs() gets preempted after the GS control block of the
task has been freed but before the pointer to it is set to NULL, then
save_gs_cb(), called from switch_to(), will write to already freed
memory.

Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.

Fixes: 916cda1a ("s390: add a system call for guarded storage")
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fa1edf3f

s390/runtime instrumentation: simplify task exit handling · 8d9047f8

由 Heiko Carstens 提交于 9月 11, 2017

Free data structures required for runtime instrumentation from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.

In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

8d9047f8

s390/runtime instrumention: fix possible memory corruption · d6e646ad

由 Heiko Carstens 提交于 9月 11, 2017

For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.

That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.

Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.

Fixes: e4b8b3f3 ("s390: add support for runtime instrumentation")
Cc: <stable@vger.kernel.org> # v3.7+
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

d6e646ad

s390: convert release_thread() into a static inline function · 8076428f

由 Heiko Carstens 提交于 9月 11, 2017

release_thread() is an empty function that gets called on every task
exit. Move the function to a header file and force inlining of it, so
that the compiler can optimize it away instead of generating a
pointless function call.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

8076428f

20 9月, 2017 2 次提交

s390/topology: enable / disable topology dynamically · 51dce386

由 Heiko Carstens 提交于 9月 14, 2017

Add a new sysctl file /proc/sys/s390/topology which displays if
topology is on (1) or off (0) as specified by the "topology=" kernel
parameter.

This allows to change topology information during runtime and
configuring it via /etc/sysctl.conf instead of using the kernel line
parameter.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

51dce386

s390/topology: alternative topology for topology-less machines · 1b25fda0

由 Heiko Carstens 提交于 9月 19, 2017

If running on machines that do not provide topology information we
currently generate a "fake" topology which defines the maximum
distance between each cpu: each cpu will be put into an own drawer.

Historically this used to be the best option for (virtual) machines in
overcommited hypervisors.

For some workloads however it is better to generate a different
topology where all cpus are siblings within a package (all cpus are
core siblings). This shows performance improvements of up to 10%,
depending on the workload.

In order to keep the current behaviour, but also allow to switch to
the different core sibling topology use the existing "topology="
kernel parameter:

Specifying "topology=on" on machines without topology information will
generate the core siblings (fake) topology information, instead of the
default topology information where all cpus have the maximum distance.

On machines which provide topology information specifying
"topology=on" does not have any effect.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1b25fda0

19 9月, 2017 2 次提交

s390/mm: fix write access check in gup_huge_pmd() · ba385c05

由 Gerald Schaefer 提交于 9月 18, 2017

The check for the _SEGMENT_ENTRY_PROTECT bit in gup_huge_pmd() is the
wrong way around. It must not be set for write==1, and not be checked for
write==0. Fix this similar to how it was fixed for ptes long time ago in
commit 25591b07 ("[S390] fix get_user_pages_fast").

One impact of this bug would be unnecessarily using the gup slow path for
write==0 on r/w mappings. A potentially more severe impact would be that
gup_huge_pmd() will succeed for write==1 on r/o mappings.

Cc: <stable@vger.kernel.org>
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ba385c05

s390/mm: make pmdp_invalidate() do invalidation only · 91c575b3

由 Gerald Schaefer 提交于 9月 18, 2017

Commit 227be799 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
inadvertently changed the behavior of pmdp_invalidate(), so that it now
clears the pmd instead of just marking it as invalid. Fix this by restoring
the original behavior.

A possible impact of the misbehaving pmdp_invalidate() would be the
MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we
should not have any negative impact on the related dirty/young flags,
since those flags are not set by the hardware on s390.

Fixes: 227be799 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

91c575b3

13 9月, 2017 1 次提交

s390/perf: fix bug when creating per-thread event · fc3100d6

由 Pu Hou 提交于 9月 05, 2017

A per-thread event could not be created correctly like below:

    perf record --per-thread -e rB0000 -- sleep 1
    Error:
    The sys_perf_event_open() syscall returned with 19 (No such device) for event (rB0000).
    /bin/dmesg may provide additional information.
    No CONFIG_PERF_EVENTS=y kernel support configured?

This bug was introduced by:

    commit c311c797
    Author: Alexey Dobriyan <adobriyan@gmail.com>
    Date:   Mon May 8 15:56:15 2017 -0700

    cpumask: make "nr_cpumask_bits" unsigned

If a per-thread event is not attached to any CPU, the cpu field
in struct perf_event is -1. The above commit converts the CPU number
to unsigned int, which result in an illegal CPU number.

Fixes: c311c797 ("cpumask: make "nr_cpumask_bits" unsigned")
Cc: <stable@vger.kernel.org> # v4.12+
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NPu Hou <bjhoupu@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fc3100d6

06 9月, 2017 7 次提交

s390/mm: use a single lock for the fields in mm_context_t · f28a4b4d

由 Martin Schwidefsky 提交于 8月 17, 2017

The three locks 'lock', 'pgtable_lock' and 'gmap_lock' in the
mm_context_t can be reduced to a single lock.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

f28a4b4d

s390/mm: fix race on mm->context.flush_mm · 60f07c8e

由 Martin Schwidefsky 提交于 8月 17, 2017

The order in __tlb_flush_mm_lazy is to flush TLB first and then clear
the mm->context.flush_mm bit. This can lead to missed flushes as the
bit can be set anytime, the order needs to be the other way aronud.

But this leads to a different race, __tlb_flush_mm_lazy may be called
on two CPUs concurrently. If mm->context.flush_mm is cleared first then
another CPU can bypass __tlb_flush_mm_lazy although the first CPU has
not done the flush yet. In a virtualized environment the time until the
flush is finally completed can be arbitrarily long.

Add a spinlock to serialize __tlb_flush_mm_lazy and use the function
in finish_arch_post_lock_switch as well.

Cc: <stable@vger.kernel.org>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

60f07c8e

s390/mm: fix local TLB flushing vs. detach of an mm address space · b3e5dc45

由 Martin Schwidefsky 提交于 8月 16, 2017

The local TLB flushing code keeps an additional mask in the mm.context,
the cpu_attach_mask. At the time a global flush of an address space is
done the cpu_attach_mask is copied to the mm_cpumask in order to avoid
future global flushes in case the mm is used by a single CPU only after
the flush.

Trouble is that the reset of the mm_cpumask is racy against the detach
of an mm address space by switch_mm. The current order is first the
global TLB flush and then the copy of the cpu_attach_mask to the
mm_cpumask. The order needs to be the other way around.

Cc: <stable@vger.kernel.org>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b3e5dc45

s390/zcrypt: externalize AP queue interrupt control · 46fde9a9

由 Harald Freudenberger 提交于 11月 09, 2016

KVM has a need to control the interrupts on real and virtualized
AP queue devices. This fix provides a new function to control
the interrupt facilities of an AP queue device.
Signed-off-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

46fde9a9

s390/zcrypt: externalize AP config info query · 050349b5

由 Harald Freudenberger 提交于 11月 08, 2016

KVM has a need to fetch the crypto configuration information
as it is returned by the PQAP(QCI) instruction. This patch
introduces a new API ap_query_configuration() which provides
this info in a handy way for the caller.
Signed-off-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

050349b5

s390/zcrypt: externalize test AP queue · e7fc5146

由 Tony Krowiak 提交于 11月 08, 2016

Under certain specified conditions, the Test AP Queue (TAPQ)
subfunction of the Process Adjunct Processor Queue (PQAP) instruction
will be intercepted by a guest VM. The guest VM must have a means for
executing the intercepted instruction.

The vfio_ap driver will provide an interface to execute the
PQAP(TAPQ) instruction subfunction on behalf of a guest VM.
The code for executing the AP instructions currently resides in the
AP bus. This patch refactors the AP bus code to externalize access
to the PQAP(TAPQ) instruction subfunction to make it available to
the vfio_ap driver.
Signed-off-by: NTony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e7fc5146

s390/mm: use VM_BUG_ON in crst_table_[upgrade|downgrade] · 2fc4876e

由 Martin Schwidefsky 提交于 8月 31, 2017

The BUG_ON in crst_table_[upgrade|downgrade] is a debugging aid,
replace it with VM_BUG_ON.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

2fc4876e

01 9月, 2017 1 次提交
- A
  teach SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE to handle __bitwise arguments · 4f59c718
  由 Al Viro 提交于 7月 08, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  4f59c718
31 8月, 2017 5 次提交

s390/mm: fix BUG_ON in crst_table_upgrade · 8ab867cb

由 Martin Schwidefsky 提交于 8月 31, 2017

A 31-bit compat process can force a BUG_ON in crst_table_upgrade
with specific, invalid mmap calls, e.g.

   mmap((void*) 0x7fff8000, 0x10000, 3, 32, -1, 0)

The arch_get_unmapped_area[_topdown] functions miss an if condition
in the decision to do a page table upgrade.

Fixes: 9b11c791 ("s390/mm: simplify arch_get_unmapped_area[_topdown]")
Cc: <stable@vger.kernel.org>  # v4.12+
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

8ab867cb

s390/mm: fork vs. 5 level page tabel · 0b89ede6

由 Martin Schwidefsky 提交于 8月 31, 2017

The mm->context.asce field of a new process is not set up correctly
in case of a fork with a 5 level page table.
Add the missing case to init_new_context().

Fixes: 1aea9b3f ("s390/mm: implement 5 level pages tables")
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0b89ede6

KVM: s390: vsie: cleanup mcck reinjection · c95c8953

由 David Hildenbrand 提交于 8月 30, 2017

The machine check information is part of the vsie_page.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-4-david@redhat.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c95c8953

KVM: s390: use WARN_ON_ONCE only for checking · 3dbf0205

由 David Hildenbrand 提交于 8月 30, 2017

Move the real logic that always has to be executed out of the
WARN_ON_ONCE.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-3-david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3dbf0205

KVM: s390: guestdbg: fix range check · 8149fc07

由 David Hildenbrand 提交于 8月 30, 2017

Looks like the "overflowing" range check is wrong.

|=======b-------a=======|

addr >= a || addr <= b
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-2-david@redhat.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8149fc07

29 8月, 2017 4 次提交

KVM: s390: we are always in czam mode · 1935222d

由 David Hildenbrand 提交于 8月 29, 2017

Independent of the underlying hardware, kvm will now always handle
SIGP SET ARCHITECTURE as if czam were enabled. Therefore, let's not
only forward that bit but always set it.

While at it, add a comment regarding STHYI.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20170829143108.14703-1-david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

1935222d

s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs · fa41ba0d

由 Christian Borntraeger 提交于 8月 24, 2017

Right now there is a potential hang situation for postcopy migrations,
if the guest is enabling storage keys on the target system during the
postcopy process.

For storage key virtualization, we have to forbid the empty zero page as
the storage key is a property of the physical page frame.  As we enable
storage key handling lazily we then drop all mappings for empty zero
pages for lazy refaulting later on.

This does not work with the postcopy migration, which relies on the
empty zero page never triggering a fault again in the future. The reason
is that postcopy migration will simply read a page on the target system
if that page is a known zero page to fault in an empty zero page.  At
the same time postcopy remembers that this page was already transferred
- so any future userfault on that page will NOT be retransmitted again
to avoid races.

If now the guest enters the storage key mode while in postcopy, we will
break this assumption of postcopy.

The solution is to disable the empty zero page for KVM guests early on
and not during storage key enablement. With this change, the postcopy
migration process is guaranteed to start after no zero pages are left.

As guest pages are very likely not empty zero pages anyway the memory
overhead is also pretty small.

While at it this also adds proper page table locking to the zero page
removal.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fa41ba0d

s390/dasd: Add discard support for FBA devices · 28b841b3

由 Jan Höppner 提交于 6月 30, 2016

The z/VM hypervisor provides virtual disks (VDISK) which are backed by
main memory of the hypervisor. Those devices are seen as DASD FBA disks
within the Linux guest.

Whenever data is written to such a device, memory is allocated
on-the-fly by z/VM accordingly. This memory, however, is not being freed
if data on the device is deleted by the guest OS.

In order to make memory usable after deletion again, add discard support
to the FBA discipline.

While at it, update comments regarding the DASD_FEATURE_* flags.
Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

28b841b3

s390/uaccess: avoid mvcos jump label · d66bf801

由 Martin Schwidefsky 提交于 8月 21, 2017

If the kernel is compiled for z10 or later machines the uaccess
code inlines the mvcos instruction. The facility bit 27 which
indicates the availability of MVCOS has to be set. The have_mvcos
jump label will always be true.

Make the generation of the have_mvcos jump label conditional on
!CONFIG_HAVE_MARCH_Z10_FEATURES.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

d66bf801

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功