提交 · f2061656209fb9a5d54bbb1999f0a633438504e7 · openeuler / Kernel

22 4月, 2014 6 次提交

KVM: s390/mm: new gmap_test_and_clear_dirty function · a0bf4f14

由 Dominik Dingel 提交于 3月 24, 2014

For live migration kvm needs to test and clear the dirty bit of guest pages.

That for is ptep_test_and_clear_user_dirty, to be sure we are not racing with
other code, we protect the pte. This needs to be done within
the architecture memory management code.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a0bf4f14

KVM: s390/mm: use software dirty bit detection for user dirty tracking · 0a61b222

由 Martin Schwidefsky 提交于 10月 18, 2013

Switch the user dirty bit detection used for migration from the hardware
provided host change-bit in the pgste to a fault based detection method.
This reduced the dependency of the host from the storage key to a point
where it becomes possible to enable the RCP bypass for KVM guests.

The fault based dirty detection will only indicate changes caused
by accesses via the guest address space. The hardware based method
can detect all changes, even those caused by I/O or accesses via the
kernel page table. The KVM/qemu code needs to take this into account.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0a61b222

KVM: s390: Don't enable skeys by default · 693ffc08

由 Dominik Dingel 提交于 1月 14, 2014

The first invocation of storage key operations on a given cpu will be intercepted.

On these intercepts we will enable storage keys for the guest and remove the
previously added intercepts.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

693ffc08

KVM: s390: Allow skeys to be enabled for the current process · 934bc131

由 Dominik Dingel 提交于 1月 14, 2014

Introduce a new function s390_enable_skey(), which enables storage key
handling via setting the use_skey flag in the mmu context.

This function is only useful within the context of kvm.

Note that enabling storage keys will cause a one-time hickup when
walking the page table; however, it saves us special effort for cases
like clear reset while making it possible for us to be architecture
conform.

s390_enable_skey() takes the page table lock to prevent reseting
storage keys triggered from multiple vcpus.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

934bc131

KVM: s390: Clear storage keys · d4cb1134

由 Dominik Dingel 提交于 1月 29, 2014

page_table_reset_pgste() already does a complete page table walk to
reset the pgste. Enhance it to initialize the storage keys to
PAGE_DEFAULT_KEY if requested by the caller. This will be used
for lazy storage key handling. Also provide an empty stub for
!CONFIG_PGSTE

Lets adopt the current code (diag 308) to not clear the keys.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d4cb1134

KVM: s390: Adding skey bit to mmu context · 65eef335

由 Dominik Dingel 提交于 1月 14, 2014

For lazy storage key handling, we need a mechanism to track if the
process ever issued a storage key operation.

This patch adds the basic infrastructure for making the storage
key handling optional, but still leaves it enabled for now by default.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

65eef335

11 4月, 2014 1 次提交

s390: wire up sys_renameat2 · a1977d12

由 Heiko Carstens 提交于 4月 08, 2014

Actually this also enable sys_setattr and sys_getattr, since I forgot to
increase NR_syscalls when adding those syscalls.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

a1977d12

09 4月, 2014 1 次提交

s390/smp: fix smp_stop_cpu() for !CONFIG_SMP · e7c46c66

由 Heiko Carstens 提交于 4月 04, 2014

smp_stop_cpu() should stop the current cpu even for !CONFIG_SMP.
Otherwise machine_halt() will return and and the machine generates a
panic instread of simply stopping the current cpu:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.0-01527-g2b6ef16a6bc5 #10
[...]
Call Trace:
([<0000000000110db0>] show_trace+0xf8/0x158)
 [<0000000000110e7a>] show_stack+0x6a/0xe8
 [<000000000074dba8>] panic+0xe4/0x268
 [<0000000000140570>] do_exit+0xa88/0xb2c
 [<000000000016e12c>] SyS_reboot+0x1f0/0x234
 [<000000000075da70>] sysc_nr_ok+0x22/0x28
 [<000000007d5a09b4>] 0x7d5a09b4
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e7c46c66

03 4月, 2014 5 次提交

s390/uaccess: rework uaccess code - fix locking issues · 457f2180

由 Heiko Carstens 提交于 3月 21, 2014

The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.

However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.

Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.

So the solution is a different approach where we change address spaces:

User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.

The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.

So we end up with:

User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce

Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
  -> the kernel asce is loaded when a uaccess requires primary or
     secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce

In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
  e.g. the mvcp instruction to access user space. However the kernel
  will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
  instructions come from primary address space and data from secondary
  space

In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.

A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.

In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.

The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.

The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

457f2180

s390/mm,tlb: optimize TLB flushing for zEC12 · 1b948d6c

由 Martin Schwidefsky 提交于 4月 03, 2014

The zEC12 machines introduced the local-clearing control for the IDTE
and IPTE instruction. If the control is set only the TLB of the local
CPU is cleared of entries, either all entries of a single address space
for IDTE, or the entry for a single page-table entry for IPTE.
Without the local-clearing control the TLB flush is broadcasted to all
CPUs in the configuration, which is expensive.

The reset of the bit mask of the CPUs that need flushing after a
non-local IDTE is tricky. As TLB entries for an address space remain
in the TLB even if the address space is detached a new bit field is
required to keep track of attached CPUs vs. CPUs in the need of a
flush. After a non-local flush with IDTE the bit-field of attached CPUs
is copied to the bit-field of CPUs in need of a flush. The ordering
of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
such that an underindication in mm_cpumask(mm) is prevented but an
overindication in mm_cpumask(mm) is possible.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1b948d6c

s390/mm,tlb: safeguard against speculative TLB creation · 02a8f3ab

由 Martin Schwidefsky 提交于 4月 03, 2014

The principles of operations states that the CPU is allowed to create
TLB entries for an address space anytime while an ASCE is loaded to
the control register. This is true even if the CPU is running in the
kernel and the user address space is not (actively) accessed.

In theory this can affect two aspects of the TLB flush logic.
For full-mm flushes the ASCE of the dying process is still attached.
The approach to flush first with IDTE and then just free all page
tables can in theory lead to stale TLB entries. Use the batched
free of page tables for the full-mm flushes as well.

For operations that can have a stale ASCE in the control register,
e.g. a delayed update_user_asce in switch_mm, load the kernel ASCE
to prevent invalid TLBs from being created.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

02a8f3ab

s390/irq: Use defines for external interruption codes · 1dad093b

由 Thomas Huth 提交于 3月 31, 2014

Use the new defines for external interruption codes to get rid
of "magic" numbers in the s390 source code. And while we're at it,
also rename the (un-)register_external_interrupt function to
something shorter so that this patch does not exceed the 80
columns all over the place.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1dad093b

s390/irq: Add defines for external interruption codes · 072c2790

由 Thomas Huth 提交于 4月 02, 2014

Introduce defines for external interruption codes so that we
can get rid of some "magic" numbers in the s390 source code.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

072c2790

01 4月, 2014 1 次提交

s390/bitops,atomic: add missing memory barriers · 0ccc8b7a

由 Heiko Carstens 提交于 3月 20, 2014

When reworking the bitops and atomic ops I missed that those instructions
that got atomic behaviour only perform a "specific-operand-serialization"
instead of a full "serialization".
The compare-and-swap instruction used before performs a full serialization
before and after the instruction is executed, which means it has full
memory barrier semantics.
In order to give the new bitops and atomic ops functions also full memory
barrier semantics add a "bcr 14,0" before and after each of those new
instructions which performs full serialization as well.

This restores memory barrier semantics for bitops and atomic ops functions
which return values, like e.g. atomic_add_return(), but not for functions
which do not return a value, like e.g. atomic_add().
This is consistent to other architectures and what common code requires.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0ccc8b7a

25 3月, 2014 1 次提交

KVM: s390: fix calculation of idle_mask array size · 609433fb

由 Jens Freimann 提交于 3月 18, 2014

We need BITS_TO_LONGS, not sizeof(long) to calculate
the correct size.

idle_mask is a bitmask, each bit representing the state
of a cpu. The desired outcome is an array of unsigned long
fields that can fit KVM_MAX_VCPUS bits. We should not use
sizeof(long) which returnes the size in bytes, but BITS_TO_LONGS
Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

609433fb

21 3月, 2014 4 次提交

KVM: s390: irq routing for adapter interrupts. · 84223598

由 Cornelia Huck 提交于 7月 15, 2013

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

84223598

KVM: s390: adapter interrupt sources · 841b91c5

由 Cornelia Huck 提交于 7月 15, 2013

Add a new interface to register/deregister sources of adapter interrupts
identified by an unique id via the flic. Adapters may also be maskable
and carry a list of pinned pages.

These adapters will be used by irq routing later.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

841b91c5

s390/mm: remove unecessary parameter from pgste_ipte_notify · 6e5a40a4

由 Dominik Dingel 提交于 3月 19, 2014

Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

6e5a40a4

s390/mm: remove unnecessary parameter from gmap_do_ipte_notify · aaeff84a

由 Dominik Dingel 提交于 3月 19, 2014

Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

aaeff84a

20 3月, 2014 2 次提交

audit: use uapi/linux/audit.h for AUDIT_ARCH declarations · 579ec9e1

由 Eric Paris 提交于 3月 11, 2014

The syscall.h headers were including linux/audit.h but really only
needed the uapi/linux/audit.h to get the requisite defines.  Switch to
the uapi headers.
Signed-off-by: NEric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org

579ec9e1

syscall_get_arch: remove useless function arguments · 5e937a9a

由 Eric Paris 提交于 3月 11, 2014

Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.
Signed-off-by: NEric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org

5e937a9a

17 3月, 2014 1 次提交

s390/smp: limit number of cpus in possible cpu mask · cf813db0

由 Heiko Carstens 提交于 3月 10, 2014

Limit the number of bits to the maximum number of cpus a machine
can have.
possible_cpu_mask typically will have more bits set than a machine
may physically have. This results in wasted memory during per-cpu
memory allocations, if the possible mask contains more cpus than
physically possible for a given configuration.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

cf813db0

14 3月, 2014 1 次提交

s390/ptrace: add support for PTRACE_SINGLEBLOCK · 818a330c

由 Martin Schwidefsky 提交于 3月 14, 2014

The PTRACE_SINGLEBLOCK option is used to get control whenever
the inferior has executed a successful branch. The PER option to
implement block stepping is successful-branching event, bit 32
in the PER-event mask.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

818a330c

06 3月, 2014 2 次提交

s390/compat: build error for large compat syscall args · 9a205286

由 Heiko Carstens 提交于 3月 05, 2014

Enforce 32 bit types for all compat syscall argument types.

This way we can make sure that all arguments get correct sign
or zero extension. Otherwise incorrect code would be generated.

E.g. for a 'long' type the COMPAT_SYSCALL_DEFINE macro wouldn't
generate code that would cause sign extension of the passed in 32
bit user space parameter.
This can cause quite subtle bugs like e.g. the one that was fixed
with dfd948e3 "fs/compat: fix parameter handling for compat
readv/writev syscalls".
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

9a205286

fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types · 932602e2

由 Heiko Carstens 提交于 3月 04, 2014

Some fs compat system calls have unsigned long parameters instead of
compat_ulong_t.
In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
performs proper zero and sign extension convert all 64 bit parameters
their corresponding 32 bit counterparts.

compat_sys_io_getevents() is a bit different: the non-compat version
has signed parameters for the "min_nr" and "nr" parameters while the
compat version has unsigned parameters.
So change this as well. For all practical purposes this shouldn't make
any difference (doesn't fix a real bug).
Also introduce a generic compat_aio_context_t type which can be used
everywhere.
The access_ok() check within compat_sys_io_getevents() got also removed
since the non-compat sys_io_getevents() should be able to handle
everything anyway.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

932602e2

04 3月, 2014 6 次提交

virtio-ccw: virtio-ccw adapter interrupt support. · 96b14536

由 Cornelia Huck 提交于 2月 06, 2013

Implement the new CCW_CMD_SET_IND_ADAPTER command and try to enable
adapter interrupts for every device on the first startup. If the host
does not support adapter interrupts, fall back to normal I/O interrupts.

virtio-ccw adapter interrupts use the same isc as normal I/O subchannels
and share a summary indicator for all devices sharing the same indicator
area.

Indicator bits for the individual virtqueues may be contained in the same
indicator area for different devices.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

96b14536

s390/airq: add support for irq ranges · 84ec96a6

由 Martin Schwidefsky 提交于 2月 13, 2014

Add airq_iv_alloc and airq_iv_free to allocate and free consecutive
ranges of irqs from the interrupt vector.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

84ec96a6

KVM: s390: get rid of local_int array · 1ee0bc55

由 Jens Freimann 提交于 2月 25, 2014

We can use kvm_get_vcpu() now and don't need the
local_int array in the floating_int struct anymore.
This also means we don't have to hold the float_int.lock
in some places.
Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

1ee0bc55

KVM: s390: expose gbea register to userspace · afa45ff5

由 Christian Borntraeger 提交于 2月 10, 2014

For migration/reset we want to expose the guest breaking event
address register to userspace. Lets use ONE_REG for that purpose.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJason J. Herne <jjherne@linux.vnet.ibm.com>

afa45ff5

KVM: s390: Provide access to program parameter · 672550fb

由 Christian Borntraeger 提交于 2月 10, 2014

commit d208c79d (KVM: s390: Enable
the LPP facility for guests) enabled the LPP instruction for guests.
We should expose the program parameter as a pseudo register for
migration/reset etc. Lets also reset this value on initial CPU
reset.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NJason J. Herne <jjherne@linux.vnet.ibm.com>

672550fb

H
s390/compat: convert system call wrappers to C part 11 · 9c4d62fa
由 Heiko Carstens 提交于 2月 28, 2014
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
9c4d62fa

25 2月, 2014 2 次提交

s390/checksum: remove memset() within csum_partial_copy_from_user() · d72d2bb5

由 Heiko Carstens 提交于 2月 24, 2014

The memset() within csum_partial_copy_from_user() is rather pointless since
copy_from_user() already cleared the rest of the destination buffer if an
exception happened.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

d72d2bb5

s390/uaccess: remove copy_from_user_real() · 82300202

由 Heiko Carstens 提交于 2月 24, 2014

There is no user left, so remove it.
It was also potentially broken, since the function didn't clear destination
memory if copy_from_user() failed. Which would allow for information leaks.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

82300202

21 2月, 2014 7 次提交

s390/airq: add support for irq ranges · fe7c30a4

由 Martin Schwidefsky 提交于 2月 13, 2014

Add airq_iv_alloc and airq_iv_free to allocate and free consecutive
ranges of irqs from the interrupt vector.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fe7c30a4

s390/mm: enable split page table lock for PMD level · ec66ad66

由 Martin Schwidefsky 提交于 2月 12, 2014

Add the pgtable_pmd_page_ctor/pgtable_pmd_page_dtor calls to the pmd
allocation and free functions and enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
for 64 bit.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ec66ad66

s390/bitops: fix comment · db85eaeb

由 Heiko Carstens 提交于 2月 12, 2014

Fix some numbers in the comments describing the layout of the bit maps.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

db85eaeb

s390/kvm: set guest page states to stable on re-ipl · deedabb2

由 Martin Schwidefsky 提交于 5月 21, 2013

The guest page state needs to be reset to stable for all pages
on initial program load via diagnose 0x308.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

deedabb2

s390/kvm: support collaborative memory management · b31288fa

由 Konstantin Weitz 提交于 4月 17, 2013

This patch enables Collaborative Memory Management (CMM) for kvm
on s390. CMM allows the guest to inform the host about page usage
(see arch/s390/mm/cmm.c). The host uses this information to avoid
swapping in unused pages in the page fault handler. Further, a CPU
provided list of unused invalid pages is processed to reclaim swap
space of not yet accessed unused pages.

[ Martin Schwidefsky: patch reordering and cleanup ]
Signed-off-by: NKonstantin Weitz <konstantin.weitz@gmail.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b31288fa

s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries · 53e857f3

由 Martin Schwidefsky 提交于 9月 10, 2012

Git commit 050eef36 "[S390] fix tlb flushing vs. concurrent
/proc accesses" introduced the attach counter to avoid using the
mm_users value to decide between IPTE for every PTE and lazy TLB
flushing with IDTE. That fixed the problem with mm_users but it
introduced another subtle race, fortunately one that is very hard
to hit.
The background is the requirement of the architecture that a valid
PTE may not be changed while it can be used concurrently by another
cpu. The decision between IPTE and lazy TLB flushing needs to be
done while the PTE is still valid. Now if the virtual cpu is
temporarily stopped after the decision to use lazy TLB flushing but
before the invalid bit of the PTE has been set, another cpu can attach
the mm, find that flush_mm is set, do the IDTE, return to userspace,
and recreate a TLB that uses the PTE in question. When the first,
stopped cpu continues it will change the PTE while it is attached on
another cpu. The first cpu will do another IDTE shortly after the
modification of the PTE which makes the race window quite short.

To fix this race the CPU that wants to attach the address space of a
user space thread needs to wait for the end of the PTE modification.
The number of concurrent TLB flushers for an mm is tracked in the
upper 16 bits of the attach_count and finish_arch_post_lock_switch
is used to wait for the end of the flush operation if required.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

53e857f3

s390/setup: get rid of MACHINE_HAS_MVCOS machine flag · ca04ddbf

由 Heiko Carstens 提交于 1月 24, 2014

MACHINE_HAS_MVCOS is used exactly once when the machine is brought up.
There is no need to cache the flag in the machine_flags.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ca04ddbf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功