提交 · 4be130a08420d6918d80c1067f8078f425eb98df · openeuler / Kernel

20 6月, 2016 5 次提交

s390/mm: add shadow gmap support · 4be130a0

由 Martin Schwidefsky 提交于 3月 08, 2016

For a nested KVM guest the outer KVM host needs to create shadow
page tables for the nested guest. This patch adds the basic support
to the guest address space (gmap) code.

For each guest address space the inner KVM host creates, the first
outer KVM host needs to create shadow page tables. The address space
is identified by the ASCE loaded into the control register 1 at the
time the inner SIE instruction for the second nested KVM guest is
executed. The outer KVM host creates the shadow tables starting with
the table identified by the ASCE on a on-demand basis. The outer KVM
host will get repeated faults for all the shadow tables needed to
run the second KVM guest.

While a shadow page table for the second KVM guest is active the access
to the origin region, segment and page tables needs to be restricted
for the first KVM guest. For region and segment and page tables the first
KVM guest may read the memory, but write attempt has to lead to an
unshadow. This is done using the page invalid and read-only bits in the
page table of the first KVM guest. If the first guest re-accesses one of
the origin pages of a shadow, it gets a fault and the affected parts of
the shadow page table hierarchy needs to be removed again.

PGSTE tables don't have to be shadowed, as all interpretation assist can't
deal with the invalid bits in the shadow pte being set differently than
the original ones provided by the first KVM guest.

Many bug fixes and improvements by David Hildenbrand.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4be130a0

s390/mm: add reference counter to gmap structure · 6ea427bb

由 Martin Schwidefsky 提交于 3月 08, 2016

Let's use a reference counter mechanism to control the lifetime of
gmap structures. This will be needed for further changes related to
gmap shadows.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

6ea427bb

s390/mm: extended gmap pte notifier · b2d73b2a

由 Martin Schwidefsky 提交于 3月 08, 2016

The current gmap pte notifier forces a pte into to a read-write state.
If the pte is invalidated the gmap notifier is called to inform KVM
that the mapping will go away.

Extend this approach to allow read-write, read-only and no-access
as possible target states and call the pte notifier for any change
to the pte.

This mechanism is used to temporarily set specific access rights for
a pte without doing the heavy work of a true mprotect call.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b2d73b2a

s390/mm: use RCU for gmap notifier list and the per-mm gmap list · 8ecb1a59

由 Martin Schwidefsky 提交于 3月 08, 2016

The gmap notifier list and the gmap list in the mm_struct change rarely.
Use RCU to optimize the reader of these lists.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8ecb1a59

s390/kvm: page table invalidation notifier · 414d3b07

由 Martin Schwidefsky 提交于 3月 08, 2016

Pass an address range to the page table invalidation notifier
for KVM. This allows to notify changes that affect a larger
virtual memory area, e.g. for 1MB pages.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

414d3b07

10 6月, 2016 6 次提交

KVM: s390: handle missing storage-key facility · a7e19ab5

由 David Hildenbrand 提交于 5月 10, 2016

Without the storage-key facility, SIE won't interpret SSKE, ISKE and
RRBE for us. So let's add proper interception handlers that will be called
if lazy sske cannot be enabled.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a7e19ab5

KVM: s390: pfmf: support conditional-sske facility · 1824c723

由 David Hildenbrand 提交于 5月 10, 2016

We already indicate that facility but don't implement it in our pfmf
interception handler. Let's add a new storage key handling function for
conditionally setting the guest storage key.

As we will reuse this function later on, let's directly implement returning
the old key via parameter and indicating if any change happened via rc.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

1824c723

s390/mm: return key via pointer in get_guest_storage_key · 154c8c19

由 David Hildenbrand 提交于 5月 09, 2016

Let's just split returning the key and reporting errors. This makes calling
code easier and avoids bugs as happened already.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

154c8c19

s390/mm: simplify get_guest_storage_key · 8d6037a7

由 David Hildenbrand 提交于 5月 09, 2016

We can safe a few LOC and make that function easier to understand
by rewriting existing code.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8d6037a7

s390/mm: set and get guest storage key mmap locking · d3ed1cee

由 Martin Schwidefsky 提交于 3月 08, 2016

Move the mmap semaphore locking out of set_guest_storage_key
and get_guest_storage_key. This makes the two functions more
like the other ptep_xxx operations and allows to avoid repeated
semaphore operations if multiple keys are read or written.
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d3ed1cee

s390/mm: don't drop errors in get_guest_storage_key · c427c42c

由 David Hildenbrand 提交于 5月 10, 2016

Commit 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c") changed
the return value of get_guest_storage_key to an unsigned char, resulting
in -EFAULT getting interpreted as a valid storage key.

Cc: stable@vger.kernel.org # 4.6+
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c427c42c

23 5月, 2016 1 次提交

s390: fix info leak in do_sigsegv · cf0d44d5

由 Michal Hocko 提交于 5月 23, 2016

Aleksa has reported incorrect si_errno value when stracing task which
received SIGSEGV:
[pid 20799] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_errno=2510266, si_addr=0x100000000000000}

The reason seems to be that do_sigsegv is not initializing siginfo
structure defined on the stack completely so it will leak 4B of
the previous stack content. Fix it simply by initializing si_errno
to 0 (same as do_sigbus does already).

Cc: stable # introduced pre-git times
Reported-by: NAleksa Sarai <asarai@suse.de>
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

cf0d44d5

11 5月, 2016 2 次提交

s390/vmem: remove unused function parameter · c53db522

由 Heiko Carstens 提交于 5月 09, 2016

vmem_pte_alloc() has an unused function parameter. Let's remove it.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c53db522

s390/vmem: fix identity mapping · c34a6905

由 Heiko Carstens 提交于 5月 10, 2016

The identity mapping is suboptimal for the last 2GB frame. The mapping
will be established with a mix of 4KB and 1MB mappings instead of a
single 2GB mapping.

This happens because of a off-by-one bug introduced with
commit 50be6345 ("s390/mm: Convert bootmem to memblock").

Currently the identity mapping looks like this:

0x0000000080000000-0x0000000180000000        4G PUD RW
0x0000000180000000-0x00000001fff00000     2047M PMD RW
0x00000001fff00000-0x0000000200000000        1M PTE RW

With the bug fixed it looks like this:

0x0000000080000000-0x0000000200000000        6G PUD RW

Fixes: 50be6345 ("s390/mm: Convert bootmem to memblock")
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c34a6905

10 5月, 2016 1 次提交

s390: add missing include statements · ca21872e

由 Heiko Carstens 提交于 5月 07, 2016

arch_mmap_rnd, cpu_have_feature, and arch_randomize_brk are all
defined as globally visible variables.
However the files they are defined in do not include the header files
with the declaration. To avoid a possible mismatch add the missing
include statements so we have proper type checking in place.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ca21872e

21 4月, 2016 1 次提交

s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd

由 Gerald Schaefer 提交于 4月 15, 2016

There is a race with multi-threaded applications between context switch and
pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
pagetable upgrade on another thread in crst_table_upgrade() could already
have set new asce_bits, but not yet the new mm->pgd. This would result in a
corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
translation exception.

Fix this by storing the complete asce instead of just the asce_bits, which
can then be read atomically from switch_mm(), so that it either sees the
old value or the new value, but no mixture. Both cases are OK. Having the
old value would result in a page fault on access to the higher level memory,
but the fault handler would see the new mm->pgd, if it was a valid access
after the mmap on the other thread has completed. So as worst-case scenario
we would have a page fault loop for the racing thread until the next time
slice.

Also remove dead code and simplify the upgrade/downgrade path, there are no
upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
There are also no concurrent upgrades, because the mmap_sem is held with
down_write() in do_mmap, so the flush and table checks during upgrade can
be removed.
Reported-by: NMichael Munday <munday@ca.ibm.com>
Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

723cacbd

16 4月, 2016 1 次提交

s390: Clarify pagefault interrupt · 0227f7c4

由 Peter Zijlstra 提交于 3月 22, 2016

While looking at set_task_state() users I stumbled over the s390 pfault
interrupt code.  Since Heiko provided a great explanation on how it
worked, I figured we ought to preserve this.

Also make a few little tweaks to the code to aid in readability and
explicitly comment the unusual blocking scheme.
Based-on-text-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0227f7c4

05 4月, 2016 1 次提交

s390/mm/kvm: fix mis-merge in gmap handling · 9c650d09

由 Christian Borntraeger 提交于 4月 04, 2016

commit 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c") dropped
some changes from commit a3a92c31 ("KVM: s390: fix mismatch
between user and in-kernel guest limit") - this breaks KVM for some
memory sizes (kvm-s390: failed to commit memory region) like
exactly 2GB.

Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c650d09

23 3月, 2016 1 次提交

s390/extable: use generic search and sort routines · c352e8b6

由 Ard Biesheuvel 提交于 3月 22, 2016

Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c352e8b6

17 3月, 2016 2 次提交

s390/mm: handle PTE-mapped tail pages in fast gup · fc897c95

由 Gerald Schaefer 提交于 3月 17, 2016

With the THP refcounting rework it is possible to see THP compound tail
pages mapped with PTEs during a THP split. This needs to be considered
when using page_cache_get_speculative(), which will always fail on tail
pages because ->_count is always zero. commit 7aef4172 "mm: handle
PTE-mapped tail pages in gerneric fast gup implementaiton" fixed it for
the generic fast gup code by using compound_head(page) instead of page,
but not for s390.

This patch is a 1:1 adaption of commit 7aef4172 for the s390 fast gup
code. Without this fix, gup will fall back to the slow path or fail
in the unlikely scenario that we hit a THP under splitting in-between
the page table split and the compound page split.

Cc: stable@vger.kernel.org # v4.5
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fc897c95

s390: add DEBUG_RODATA support · 91d37211

由 Heiko Carstens 提交于 3月 17, 2016

git commit d2aa1aca ("mm/init: Add 'rodata=off' boot cmdline
parameter to disable read-only kernel mappings") adds a bogus warning
to the console which states that s390 does not support kernel memory
protection.

This however is not true. We do support that since a couple of years
however in a different way than the author of the above named patch
expected.

To get rid of the misleading message implement the mark_rodata_ro
function and emit a message which states the amount of memory which
was write protected already earlier.

This is the same what parisc currently does.

We currently do not support the kernel parameter "rodata=off" which
would allow to write to the rodata section again. However since we
have this feature since years without any problems there is no reason
to add support for this.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

91d37211

16 3月, 2016 1 次提交

s390: query dynamic DEBUG_PAGEALLOC setting · 10917b83

由 Christian Borntraeger 提交于 3月 15, 2016

We can use debug_pagealloc_enabled() to check if we can map the identity
mapping with 1MB/2GB pages as well as to print the current setting in
dump_stack.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10917b83

08 3月, 2016 3 次提交

s390/mm: split arch/s390/mm/pgtable.c · 1e133ab2

由 Martin Schwidefsky 提交于 3月 08, 2016

The pgtable.c file is quite big, before it grows any larger split it
into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related
header definitions into the new gmap.h header and all of the pgste
helpers from pgtable.h to pgtable.c.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1e133ab2

s390/mm: uninline pmdp_xxx functions from pgtable.h · 227be799

由 Martin Schwidefsky 提交于 3月 08, 2016

The pmdp_xxx function are smaller than their ptep_xxx counterparts
but to keep things symmetrical unline them as well.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

227be799

s390/mm: uninline ptep_xxx functions from pgtable.h · ebde765c

由 Martin Schwidefsky 提交于 3月 08, 2016

The code in the various ptep_xxx functions has grown quite large,
consolidate them to four out-of-line functions:
ptep_xchg_direct to exchange a pte with another with immediate flushing
ptep_xchg_lazy to exchange a pte with another in a batched update
ptep_modify_prot_start to begin a protection flags update
ptep_modify_prot_commit to commit a protection flags update
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ebde765c

07 3月, 2016 1 次提交

s390: Use pr_warn instead of pr_warning · baebc70a

由 Joe Perches 提交于 3月 03, 2016

Convert the uses of pr_warning to pr_warn so there are fewer
uses of the old pr_warning.

Miscellanea:

o Align arguments
o Coalesce formats
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

baebc70a

02 3月, 2016 2 次提交

s390/fault: merge report_user_fault implementations · 5d7eccec

由 Heiko Carstens 提交于 2月 24, 2016

We have two close to identical report_user_fault functions.
Add a parameter to one and get rid of the other one in order
to reduce code duplication.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

5d7eccec

s390/kvm: simplify set_guest_storage_key · 443a8133

由 Martin Schwidefsky 提交于 2月 24, 2016

Git commit ab3f285f
"KVM: s390/mm: try a cow on read only pages for key ops"
added a fixup_user_fault to set_guest_storage_key force a copy on
write if the page is mapped read-only. This is supposed to fix the
problem of differing storage keys for shared mappings, e.g. the
empty_zero_page.
But if the storage key is set before the pte is mapped the storage
key update is done on the pgste. A later fault will happily map the
shared page with the key from the pgste.

Eventually git commit 2faee8ff
"s390/mm: prevent and break zero page mappings in case of storage keys"
fixed this problem for the empty_zero_page. The commit makes sure that
guests enabled for storage keys will not use the empty_zero_page at all.

As the call to fixup_user_fault in set_guest_storage_key depends on the
order of the storage key operation vs. the fault that maps the pte
it does not really fix anything. Just remove it.
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

443a8133

23 2月, 2016 1 次提交

s390/pageattr: do a single TLB flush for change_page_attr · 007ccec5

由 Martin Schwidefsky 提交于 2月 04, 2016

The change of the access rights for an address range in the kernel
address space is currently done with a loop of IPTE + a store of the
modified PTE. Between the IPTE and the store the PTE will be invalid,
this intermediate state can cause problems with concurrent accesses.

Consider a change of a kernel area from read-write to read-only, a
concurrent reader of that area should be fine but with the invalid
PTE it might get an unexpected exception.

Remove the IPTEs for each PTE and do a global flush after all PTEs
have been modified.
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

007ccec5

17 2月, 2016 1 次提交

s390/maccess: reduce stnsm instructions · 52499d93

由 Heiko Carstens 提交于 2月 12, 2016

When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
on kdump") both Christian and I missed that we can save an additional
stnsm instruction.

This saves us a couple of cycles which could improve the speed of
memcpy_real.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

52499d93

16 2月, 2016 1 次提交

mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d

由 Dave Hansen 提交于 2月 12, 2016

We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called.  For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)

This patch switches all callers of:

	get_user_pages()
	get_user_pages_unlocked()
	get_user_pages_locked()

to stop passing tsk/mm so they will no longer see the warnings.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d4edcf0d

11 2月, 2016 1 次提交

s390: fix DAT off memory access, e.g. on kdump · 0986d977

由 Christian Borntraeger 提交于 2月 09, 2016

commit 204ee2c5 ("s390/irqflags: optimize irq restore") optimized
irqrestore to really only care about interrupts and adapted the
remaining low level users. One spot (memcpy_real) was not touched,
though - fix it. Otherwise a kdump kernel will fail while reading
the old kernel. As we re-enable irqs with a non-standard function
we have to tell lockdep about that.

Fixes: 204ee2c5 ("s390/irqflags: optimize irq restore")
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0986d977

19 1月, 2016 4 次提交

s390: remove all usages of PSW_ADDR_INSN · 9cb1ccec

由 Heiko Carstens 提交于 1月 18, 2016

Yet another leftover from the 31 bit era. The usual operation
"y = x & PSW_ADDR_INSN" with the PSW_ADDR_INSN mask is a nop for
CONFIG_64BIT.

Therefore remove all usages and hope the code is a bit less confusing.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>

9cb1ccec

s390: remove all usages of PSW_ADDR_AMODE · fecc868a

由 Heiko Carstens 提交于 1月 18, 2016

This is a leftover from the 31 bit area. For CONFIG_64BIT the usual
operation "y = x | PSW_ADDR_AMODE" is a nop. Therefore remove all
usages of PSW_ADDR_AMODE and make the code a bit less confusing.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>

fecc868a

s390/irqflags: optimize irq restore · 204ee2c5

由 Christian Borntraeger 提交于 1月 11, 2016

The ssm instruction takes longer that stnsm/stosm as it is often
used to modify DAT and PER. We know that irqsave/irqrestore only
deals with external and I/O interrupts and we know that irqrestore
can transition only from disabled->disabled or disabled->enabled,
so we can use the faster stosm.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

204ee2c5

s390/mm: use TASK_MAX_SIZE where applicable · a9d7ab97

由 Dominik Dingel 提交于 1月 11, 2016

To improve readability we can use TASK_MAX_SIZE when we just check for the
upper limit.  All places explicitly dealing with 3 vs 4 level pgtables
were left unchanged.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-By: NSascha Silbe <silbe@linux.vnet.ibm.com>

a9d7ab97

16 1月, 2016 4 次提交

s390/mm: enable fixup_user_fault retrying · fef8953a

由 Dominik Dingel 提交于 1月 15, 2016

By passing a non-null flag we allow fixup_user_fault to retry, which
enables userfaultfd.  As during these retries we might drop the mmap_sem
we need to check if that happened and redo the complete chain of
actions.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fef8953a

mm: bring in additional flag for fixup_user_fault to signal unlock · 4a9e1cda

由 Dominik Dingel 提交于 1月 15, 2016

During Jason's work with postcopy migration support for s390 a problem
regarding gmap faults was discovered.

The gmap code will call fixup_user_fault which will end up always in
handle_mm_fault.  Till now we never cared about retries, but as the
userfaultfd code kind of relies on it.  this needs some fix.

This patchset does not take care of the futex code.  I will now look
closer at this.

This patch (of 2):

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
faulting we ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and
so a caller wasn't able to handle that case.  So we now changed the
behaviour to always retry a locked mmap_sem.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a9e1cda

s390, thp: remove infrastructure for handling splitting PMDs · fecffad2

由 Kirill A. Shutemov 提交于 1月 15, 2016

With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fecffad2

mm: drop tail page refcounting · ddc58f27

由 Kirill A. Shutemov 提交于 1月 15, 2016

Tail page refcounting is utterly complicated and painful to support.

It uses ->_mapcount on tail pages to store how many times this page is
pinned.  get_page() bumps ->_mapcount on tail page in addition to
->_count on head.  This information is required by split_huge_page() to
be able to distribute pins from head of compound page to tails during
the split.

We will need ->_mapcount to account PTE mappings of subpages of the
compound page.  We eliminate need in current meaning of ->_mapcount in
tail pages by forbidding split entirely if the page is pinned.

The only user of tail page refcounting is THP which is marked BROKEN for
now.

Let's drop all this mess.  It makes get_page() and put_page() much
simpler.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ddc58f27

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功