提交 · 152125b7a882df36a55a8eadbea6d0edf1461ee7 · openeuler / Kernel

01 8月, 2014 2 次提交

s390/mm: implement dirty bits for large segment table entries · 152125b7

由 Martin Schwidefsky 提交于 7月 24, 2014

The large segment table entry format has block of bits for the
ACC/F values for the large page. These bits are valid only if
another bit (AV bit 0x10000) of the segment table entry is set.
The ACC/F bits do not have a meaning if the AV bit is off.
This allows to put the THP splitting bit, the segment young bit
and the new segment dirty bit into the ACC/F bits as long as
the AV bit stays off. The dirty and young information is only
available if the pmd is large.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

152125b7

KVM: s390/mm: Fix page table locking vs. split pmd lock · 55e4283c

由 Christian Borntraeger 提交于 7月 25, 2014

commit ec66ad66 (s390/mm: enable
split page table lock for PMD level) activated the split pmd lock
for s390. Turns out that we missed one place: We also have to take
the pmd lock instead of the page table lock when we reallocate the
page tables (==> changing entries in the PMD) during sie enablement.

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

55e4283c

20 5月, 2014 1 次提交

s390/uaccess: simplify control register updates · beef560b

由 Martin Schwidefsky 提交于 4月 14, 2014

Always switch to the kernel ASCE in switch_mm. Load the secondary
space ASCE in finish_arch_post_lock_switch after checking that
any pending page table operations have completed. The primary
ASCE is loaded in entry[64].S. With this the update_primary_asce
call can be removed from the switch_to macro and from the start
of switch_mm function. Remove the load_primary argument from
update_user_asce/clear_user_asce, rename update_user_asce to
set_user_asce and rename update_primary_asce to load_kernel_asce.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

beef560b

16 5月, 2014 1 次提交

KVM: s390: correct locking for s390_enable_skey · 3a801517

由 Martin Schwidefsky 提交于 5月 16, 2014

Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3a801517

22 4月, 2014 4 次提交

KVM: s390/mm: new gmap_test_and_clear_dirty function · a0bf4f14

由 Dominik Dingel 提交于 3月 24, 2014

For live migration kvm needs to test and clear the dirty bit of guest pages.

That for is ptep_test_and_clear_user_dirty, to be sure we are not racing with
other code, we protect the pte. This needs to be done within
the architecture memory management code.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a0bf4f14

KVM: s390/mm: use software dirty bit detection for user dirty tracking · 0a61b222

由 Martin Schwidefsky 提交于 10月 18, 2013

Switch the user dirty bit detection used for migration from the hardware
provided host change-bit in the pgste to a fault based detection method.
This reduced the dependency of the host from the storage key to a point
where it becomes possible to enable the RCP bypass for KVM guests.

The fault based dirty detection will only indicate changes caused
by accesses via the guest address space. The hardware based method
can detect all changes, even those caused by I/O or accesses via the
kernel page table. The KVM/qemu code needs to take this into account.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0a61b222

KVM: s390: Allow skeys to be enabled for the current process · 934bc131

由 Dominik Dingel 提交于 1月 14, 2014

Introduce a new function s390_enable_skey(), which enables storage key
handling via setting the use_skey flag in the mmu context.

This function is only useful within the context of kvm.

Note that enabling storage keys will cause a one-time hickup when
walking the page table; however, it saves us special effort for cases
like clear reset while making it possible for us to be architecture
conform.

s390_enable_skey() takes the page table lock to prevent reseting
storage keys triggered from multiple vcpus.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

934bc131

KVM: s390: Clear storage keys · d4cb1134

由 Dominik Dingel 提交于 1月 29, 2014

page_table_reset_pgste() already does a complete page table walk to
reset the pgste. Enhance it to initialize the storage keys to
PAGE_DEFAULT_KEY if requested by the caller. This will be used
for lazy storage key handling. Also provide an empty stub for
!CONFIG_PGSTE

Lets adopt the current code (diag 308) to not clear the keys.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d4cb1134

08 4月, 2014 1 次提交

mm: revert "thp: make MADV_HUGEPAGE check for mm->def_flags" · 1e1836e8

由 Alex Thorlton 提交于 4月 07, 2014

The main motivation behind this patch is to provide a way to disable THP
for jobs where the code cannot be modified, and using a malloc hook with
madvise is not an option (i.e. statically allocated data). This patch
allows us to do just that, without affecting other jobs running on the
system.

We need to do this sort of thing for jobs where THP hurts performance,
due to the possibility of increased remote memory accesses that can be
created by situations such as the following:

When you touch 1 byte of an untouched, contiguous 2MB chunk, a THP will
be handed out, and the THP will be stuck on whatever node the chunk was
originally referenced from. If many remote nodes need to do work on
that same chunk, they'll be making remote accesses.

With THP disabled, 4K pages can be handed out to separate nodes as
they're needed, greatly reducing the amount of remote accesses to
memory.

This patch is based on some of my work combined with some
suggestions/patches given by Oleg Nesterov. The main goal here is to
add a prctl switch to allow us to disable to THP on a per mm_struct
basis.

Here's a bit of test data with the new patch in place...

First with the flag unset:

# perf stat -a ./prctl_wrapper_mmv3 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g
Setting thp_disabled for this task...
thp_disable: 0
Set thp_disabled state to 0
Process pid = 18027

PF/
MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/
TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES
512 1.120 0.060 0.000 0.110 0.110 0.000 28571428864 -9223372036854775808 55803572 23

Performance counter stats for './prctl_wrapper_mmv3_hack 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g':

273719072.841402 task-clock # 641.026 CPUs utilized [100.00%]
1,008,986 context-switches # 0.000 M/sec [100.00%]
7,717 CPU-migrations # 0.000 M/sec [100.00%]
1,698,932 page-faults # 0.000 M/sec
355,222,544,890,379 cycles # 1.298 GHz [100.00%]
536,445,412,234,588 stalled-cycles-frontend # 151.02% frontend cycles idle [100.00%]
409,110,531,310,223 stalled-cycles-backend # 115.17% backend cycles idle [100.00%]
148,286,797,266,411 instructions # 0.42 insns per cycle
# 3.62 stalled cycles per insn [100.00%]
27,061,793,159,503 branches # 98.867 M/sec [100.00%]
1,188,655,196 branch-misses # 0.00% of all branches

427.001706337 seconds time elapsed

Now with the flag set:

# perf stat -a ./prctl_wrapper_mmv3 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g
Setting thp_disabled for this task...
thp_disable: 1
Set thp_disabled state to 1
Process pid = 144957

PF/
MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/
TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES
512 0.620 0.260 0.250 0.320 0.570 0.001 51612901376 128000000000 100806448 23

Performance counter stats for './prctl_wrapper_mmv3_hack 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g':

138789390.540183 task-clock # 641.959 CPUs utilized [100.00%]
534,205 context-switches # 0.000 M/sec [100.00%]
4,595 CPU-migrations # 0.000 M/sec [100.00%]
63,133,119 page-faults # 0.000 M/sec
147,977,747,269,768 cycles # 1.066 GHz [100.00%]
200,524,196,493,108 stalled-cycles-frontend # 135.51% frontend cycles idle [100.00%]
105,175,163,716,388 stalled-cycles-backend # 71.07% backend cycles idle [100.00%]
180,916,213,503,160 instructions # 1.22 insns per cycle
# 1.11 stalled cycles per insn [100.00%]
26,999,511,005,868 branches # 194.536 M/sec [100.00%]
714,066,351 branch-misses # 0.00% of all branches

216.196778807 seconds time elapsed

As with previous versions of the patch, We're getting about a 2x
performance increase here. Here's a link to the test case I used, along
with the little wrapper to activate the flag:

http://oss.sgi.com/projects/memtests/thp_pthread_mmprctlv3.tar.gz

This patch (of 3):

Revert commit 8e72033f and add in code to fix up any issues caused
by the revert.

The revert is necessary because hugepage_madvise would return -EINVAL
when VM_NOHUGEPAGE is set, which will break subsequent chunks of this
patch set.

Here's a snip of an e-mail from Gerald detailing the original purpose of
this code, and providing justification for the revert:

"The intent of commit 8e72033f was to guard against any future
programming errors that may result in an madvice(MADV_HUGEPAGE) on
guest mappings, which would crash the kernel.

Martin suggested adding the bit to arch/s390/mm/pgtable.c, if
8e72033f was to be reverted, because that check will also prevent
a kernel crash in the case described above, it will now send a
SIGSEGV instead.

This would now also allow to do the madvise on other parts, if
needed, so it is a more flexible approach. One could also say that
it would have been better to do it this way right from the
beginning..."
Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e1836e8

03 4月, 2014 3 次提交

s390/uaccess: rework uaccess code - fix locking issues · 457f2180

由 Heiko Carstens 提交于 3月 21, 2014

The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.

However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.

Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.

So the solution is a different approach where we change address spaces:

User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.

The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.

So we end up with:

User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce

Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
  -> the kernel asce is loaded when a uaccess requires primary or
     secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce

In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
  e.g. the mvcp instruction to access user space. However the kernel
  will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
  instructions come from primary address space and data from secondary
  space

In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.

A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.

In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.

The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.

The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

457f2180

s390/mm,tlb: optimize TLB flushing for zEC12 · 1b948d6c

由 Martin Schwidefsky 提交于 4月 03, 2014

The zEC12 machines introduced the local-clearing control for the IDTE
and IPTE instruction. If the control is set only the TLB of the local
CPU is cleared of entries, either all entries of a single address space
for IDTE, or the entry for a single page-table entry for IPTE.
Without the local-clearing control the TLB flush is broadcasted to all
CPUs in the configuration, which is expensive.

The reset of the bit mask of the CPUs that need flushing after a
non-local IDTE is tricky. As TLB entries for an address space remain
in the TLB even if the address space is detached a new bit field is
required to keep track of attached CPUs vs. CPUs in the need of a
flush. After a non-local flush with IDTE the bit-field of attached CPUs
is copied to the bit-field of CPUs in need of a flush. The ordering
of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
such that an underindication in mm_cpumask(mm) is prevented but an
overindication in mm_cpumask(mm) is possible.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1b948d6c

s390/mm,tlb: safeguard against speculative TLB creation · 02a8f3ab

由 Martin Schwidefsky 提交于 4月 03, 2014

The principles of operations states that the CPU is allowed to create
TLB entries for an address space anytime while an ASCE is loaded to
the control register. This is true even if the CPU is running in the
kernel and the user address space is not (actively) accessed.

In theory this can affect two aspects of the TLB flush logic.
For full-mm flushes the ASCE of the dying process is still attached.
The approach to flush first with IDTE and then just free all page
tables can in theory lead to stale TLB entries. Use the batched
free of page tables for the full-mm flushes as well.

For operations that can have a stale ASCE in the control register,
e.g. a delayed update_user_asce in switch_mm, load the kernel ASCE
to prevent invalid TLBs from being created.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

02a8f3ab

21 3月, 2014 2 次提交

s390/mm: remove unnecessary parameter from gmap_do_ipte_notify · aaeff84a

由 Dominik Dingel 提交于 3月 19, 2014

Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

aaeff84a

s390/mm: fixing comment so that parameter name match · c7c5be73

由 Dominik Dingel 提交于 3月 19, 2014

Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c7c5be73

21 2月, 2014 3 次提交

s390/mm: enable split page table lock for PMD level · ec66ad66

由 Martin Schwidefsky 提交于 2月 12, 2014

Add the pgtable_pmd_page_ctor/pgtable_pmd_page_dtor calls to the pmd
allocation and free functions and enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
for 64 bit.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ec66ad66

s390/kvm: set guest page states to stable on re-ipl · deedabb2

由 Martin Schwidefsky 提交于 5月 21, 2013

The guest page state needs to be reset to stable for all pages
on initial program load via diagnose 0x308.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

deedabb2

s390/kvm: support collaborative memory management · b31288fa

由 Konstantin Weitz 提交于 4月 17, 2013

This patch enables Collaborative Memory Management (CMM) for kvm
on s390. CMM allows the guest to inform the host about page usage
(see arch/s390/mm/cmm.c). The host uses this information to avoid
swapping in unused pages in the page fault handler. Further, a CPU
provided list of unused invalid pages is processed to reclaim swap
space of not yet accessed unused pages.

[ Martin Schwidefsky: patch reordering and cleanup ]
Signed-off-by: NKonstantin Weitz <konstantin.weitz@gmail.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b31288fa

16 1月, 2014 1 次提交

s390: Fix misspellings using 'codespell' tool · b4a96015

由 Hendrik Brueckner 提交于 12月 13, 2013

Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b4a96015

15 11月, 2013 2 次提交

s390: handle pgtable_page_ctor() fail · e89cfa58

由 Kirill A. Shutemov 提交于 11月 14, 2013

Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e89cfa58

mm, thp: do not access mm->pmd_huge_pte directly · c389a250

由 Kirill A. Shutemov 提交于 11月 14, 2013

Currently mm->pmd_huge_pte protected by page table lock.  It will not
work with split lock.  We have to have per-pmd pmd_huge_pte for proper
access serialization.

For now, let's just introduce wrapper to access mm->pmd_huge_pte.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NAlex Thorlton <athorlton@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c389a250

04 11月, 2013 1 次提交

s390/mm,tlb: correct tlb flush on page table upgrade · 10607864

由 Martin Schwidefsky 提交于 10月 28, 2013

The IDTE instruction used to flush TLB entries for a specific address
space uses the address-space-control element (ASCE) to identify
affected TLB entries. The upgrade of a page table adds a new top
level page table which changes the ASCE. The TLB entries associated
with the old ASCE need to be flushed and the ASCE for the address space
needs to be replaced synchronously on all CPUs which currently use it.
The concept of a lazy ASCE update with an exception handler is broken.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

10607864

31 10月, 2013 1 次提交

s390/mm: page_table_realloc returns failure · be39f196

由 Dominik Dingel 提交于 10月 31, 2013

There is a possible race between setting has_pgste and reallocation of the
page_table, change the order to fix this.
Also page_table_alloc_pgste can fail, in that case we need to backpropagte this
as -ENOMEM to the caller of page_table_realloc.

Based on a patch by Christian Borntraeger <borntraeger@de.ibm.com>.
Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

be39f196

24 10月, 2013 1 次提交

s390/uaccess: always run the kernel in home space · e258d719

由 Martin Schwidefsky 提交于 9月 24, 2013

Simplify the uaccess code by removing the user_mode=home option.
The kernel will now always run in the home space mode.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e258d719

07 9月, 2013 2 次提交

s390: make various functions static, add declarations to header files · 63df41d6

由 Heiko Carstens 提交于 9月 06, 2013

Make various functions static, add declarations to header files to
fix a couple of sparse findings.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

63df41d6

H
s390/mm: add __releases()/__acquires() annotations to gmap_alloc_table() · 984e2a59
由 Heiko Carstens 提交于 9月 06, 2013
```
Let sparse not incorrectly complain about unbalanced locking.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
984e2a59

29 8月, 2013 1 次提交

s390/mm: implement software referenced bits · 0944fe3f

由 Martin Schwidefsky 提交于 7月 23, 2013

The last remaining use for the storage key of the s390 architecture
is reference counting. The alternative is to make page table entries
invalid while they are old. On access the fault handler marks the
pte/pmd as young which makes the pte/pmd valid if the access rights
allow read access. The pte/pmd invalidations required for software
managed reference bits cost a bit of performance, on the other hand
the RRBE/RRBM instructions to read and reset the referenced bits are
quite expensive as well.
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0944fe3f

22 8月, 2013 2 次提交

s390/mm: introduce ptep_flush_lazy helper · 5c474a1e

由 Martin Schwidefsky 提交于 8月 16, 2013

Isolate the logic of IDTE vs. IPTE flushing of ptes in two functions,
ptep_flush_lazy and __tlb_flush_mm_lazy.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

5c474a1e

s390/mm: cleanup page table definitions · e5098611

由 Martin Schwidefsky 提交于 7月 23, 2013

Improve the encoding of the different pte types and the naming of the
page, segment table and region table bits. Due to the different pte
encoding the hugetlbfs primitives need to be adapted as well. To improve
compatability with common code make the huge ptes use the encoding of
normal ptes. The conversion between the pte and pmd encoding for a huge
pte is done with set_huge_pte_at and huge_ptep_get.
Overall the code is now easier to understand.
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e5098611

29 7月, 2013 2 次提交

KVM: s390: fix task size check · ee6ee55b

由 Martin Schwidefsky 提交于 7月 26, 2013

The gmap_map_segment function uses PGDIR_SIZE in the check for the
maximum address in the tasks address space. This incorrectly limits
the amount of memory usable for a kvm guest to 4TB. The correct limit
is (1UL << 53). As the TASK_SIZE has different values (4TB vs 8PB)
dependent on the existance of the fourth page table level, create
a new define 'TASK_MAX_SIZE' for (1UL << 53).
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee6ee55b

KVM: s390: allow sie enablement for multi-threaded programs · 3eabaee9

由 Martin Schwidefsky 提交于 7月 26, 2013

Improve the code to upgrade the standard 2K page tables to 4K page tables
with PGSTEs to allow the operation to happen when the program is already
multi-threaded.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3eabaee9

27 6月, 2013 1 次提交

s390/kvm: Provide function for setting the guest storage key · 24d5dd02

由 Christian Borntraeger 提交于 5月 27, 2013

From time to time we need to set the guest storage key. Lets
provide a helper function that handles the changes with all the
right locking and checking.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

24d5dd02

20 6月, 2013 1 次提交

mm/THP: add pmd args to pgtable deposit and withdraw APIs · 6b0b50b0

由 Aneesh Kumar K.V 提交于 6月 05, 2013

This will be later used by powerpc THP support. In powerpc we want to use
pgtable for storing the hash index values. So instead of adding them to
mm_context list, we would like to store them in the second half of pmd
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6b0b50b0

17 6月, 2013 1 次提交

KVM: s390: Provide function for setting the guest storage key · db70ccdf

由 Christian Borntraeger 提交于 6月 12, 2013

From time to time we need to set the guest storage key. Lets
provide a helper function that handles the changes with all the
right locking and checking.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db70ccdf

31 5月, 2013 1 次提交

s390/pgtable: Fix gmap notifier address · e86cbd87

由 Christian Borntraeger 提交于 5月 29, 2013

The address of the gmap notifier was broken, resulting in
unhandled validity intercepts in KVM. Fix the rmap->vmaddr
to be on a segment boundary.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e86cbd87

21 5月, 2013 2 次提交

s390: fix gmap_ipte_notifier vs. software dirty pages · f8b5ff2c

由 Christian Borntraeger 提交于 5月 17, 2013

On heavy paging load some guest cpus started to loop in gmap_ipte_notify.
This was visible as stalled cpus inside the guest. The gmap_ipte_notifier
tries to map a user page and then made sure that the pte is valid and
writable. Turns out that with the software change bit tracking the pte
can become read-only (and only software writable) if the page is clean.
Since we loop in this code, the page would stay clean and, therefore,
be never writable again.
Let us just use fixup_user_fault, that guarantees to call handle_mm_fault.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f8b5ff2c

s390/kvm: rename RCP_xxx defines to PGSTE_xxx · 0d0dafc1

由 Martin Schwidefsky 提交于 5月 17, 2013

The RCP byte is a part of the PGSTE value, the existing RCP_xxx names
are inaccurate. As the defines describe bits and pieces of the PGSTE,
the names should start with PGSTE_. The KVM_UR_BIT and KVM_UC_BIT are
part of the PGSTE as well, give them better names as well.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0d0dafc1

15 5月, 2013 1 次提交

s390: fix gmap_ipte_notifier vs. software dirty pages · bb4b42ce

由 Christian Borntraeger 提交于 5月 08, 2013

On heavy paging load some guest cpus started to loop in gmap_ipte_notify.
This was visible as stalled cpus inside the guest. The gmap_ipte_notifier
tries to map a user page and then made sure that the pte is valid and
writable. Turns out that with the software change bit tracking the pte
can become read-only (and only software writable) if the page is clean.
Since we loop in this code, the page would stay clean and, therefore,
be never writable again.
Let us just use fixup_user_fault, that guarantees to call handle_mm_fault.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bb4b42ce

03 5月, 2013 1 次提交

s390/mm: add pte invalidation notifier for kvm · d3383632

由 Martin Schwidefsky 提交于 4月 17, 2013

Add a notifier for kvm to get control before a page table entry is
invalidated. The notifier is only called for ptes of an address space
with pgstes that have been explicitly marked to require notification.
Kvm will use this to get control before prefix pages of virtual CPU
are unmapped.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

d3383632

23 4月, 2013 2 次提交

s390/mm,gmap: segment mapping race · ab8e5235

由 Martin Schwidefsky 提交于 4月 16, 2013

The gmap_map_segment function creates a special invalid segment table
entry with the address of the requested target location in the process
address space. The first access will create the connection between the
gmap segment table and the target page table of the main process.
If two threads do this concurrently both will walk the page tables and
allocate a gmap_rmap structure for the same segment table entry.
To avoid the race recheck the segment table entry after taking to page
table lock.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ab8e5235

s390/mm,gmap: implement gmap_translate() · c5034945

由 Heiko Carstens 提交于 9月 10, 2012

Implement gmap_translate() function which translates a guest absolute address
to a user space process address without establishing the guest page table
entries.

This is useful for kvm guest address translations where no memory access
is expected to happen soon (e.g. tprot exception handler).
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c5034945

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功