提交 · 6077776b5908e0493a3946f7d3bc63871b201e87 · openanolis / cloud-kernel

17 5月, 2016 1 次提交

bpf: split HAVE_BPF_JIT into cBPF and eBPF variant · 6077776b

由 Daniel Borkmann 提交于 5月 13, 2016

Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs.

Current cBPF ones:

  # git grep -n HAVE_CBPF_JIT arch/
  arch/arm/Kconfig:44:    select HAVE_CBPF_JIT
  arch/mips/Kconfig:18:   select HAVE_CBPF_JIT if !CPU_MICROMIPS
  arch/powerpc/Kconfig:129:       select HAVE_CBPF_JIT
  arch/sparc/Kconfig:35:  select HAVE_CBPF_JIT

Current eBPF ones:

  # git grep -n HAVE_EBPF_JIT arch/
  arch/arm64/Kconfig:61:  select HAVE_EBPF_JIT
  arch/s390/Kconfig:126:  select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES
  arch/x86/Kconfig:94:    select HAVE_EBPF_JIT                    if X86_64

Later code also needs this facility to check for eBPF JITs.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6077776b

21 4月, 2016 2 次提交

s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd

由 Gerald Schaefer 提交于 4月 15, 2016

There is a race with multi-threaded applications between context switch and
pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
pagetable upgrade on another thread in crst_table_upgrade() could already
have set new asce_bits, but not yet the new mm->pgd. This would result in a
corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
translation exception.

Fix this by storing the complete asce instead of just the asce_bits, which
can then be read atomically from switch_mm(), so that it either sees the
old value or the new value, but no mixture. Both cases are OK. Having the
old value would result in a page fault on access to the higher level memory,
but the fault handler would see the new mm->pgd, if it was a valid access
after the mmap on the other thread has completed. So as worst-case scenario
we would have a page fault loop for the racing thread until the next time
slice.

Also remove dead code and simplify the upgrade/downgrade path, there are no
upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
There are also no concurrent upgrades, because the mmap_sem is held with
down_write() in do_mmap, so the flush and table checks during upgrade can
be removed.
Reported-by: NMichael Munday <munday@ca.ibm.com>
Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

723cacbd

s390/pci: fix use after free in dma_init · dba59909

由 Sebastian Ott 提交于 4月 15, 2016

After a failure during registration of the dma_table (because of the
function being in error state) we free its memory but don't reset the
associated pointer to zero.

When we then receive a notification from firmware (about the function
being in error state) we'll try to walk and free the dma_table again.

Fix this by resetting the dma_table pointer. In addition to that make
sure that we free the iommu_bitmap when appropriate.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

dba59909

16 4月, 2016 2 次提交

s390: add CPU_BIG_ENDIAN config option · 2fd92273

由 Heiko Carstens 提交于 4月 14, 2016

Make sure that s390 appears to be a big endian machine by defining
this config option.

Without this s390 appears to be little endian as seen by e.g. the
recordmount script: "perl ./scripts/recordmcount.pl "s390" "little"
"64""
This has no practical impact within the script since the endian
variable is only evaluated for mips. However there are already a
couple of common code places which evaluate this config option. None
of them is relevant for s390 currently though.

To avoid any issues in the future (and fix the recordmcount oddity)
add the new config option.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

2fd92273

s390/spinlock: avoid yield to non existent cpu · 84976952

由 Heiko Carstens 提交于 4月 13, 2016

arch_spin_lock_wait_flags() checks if a spinlock is not held before
trying a compare and swap instruction. If the lock is unlocked it
tries the compare and swap instruction, however if a different cpu
grabbed the lock in the meantime the instruction will fail as
expected.

Subsequently the arch_spin_lock_wait_flags() incorrectly tries to
figure out if the cpu that holds the lock is running. However it is
using the wrong cpu number for this (-1) and then will also yield the
current cpu to the wrong cpu.

Fix this by adding a missing continue statement.

Fixes: 470ada6b ("s390/spinlock: refactor arch_spin_lock_wait[_flags]")
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

84976952

05 4月, 2016 2 次提交

s390/mm/kvm: fix mis-merge in gmap handling · 9c650d09

由 Christian Borntraeger 提交于 4月 04, 2016

commit 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c") dropped
some changes from commit a3a92c31 ("KVM: s390: fix mismatch
between user and in-kernel guest limit") - this breaks KVM for some
memory sizes (kvm-s390: failed to commit memory region) like
exactly 2GB.

Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c650d09

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

01 4月, 2016 4 次提交

s390/seccomp: include generic seccomp header file · 4f375903

由 Sudip Mukherjee 提交于 4月 01, 2016

Fixes this build error on linux-next:

kernel/seccomp.c: In function '__secure_computing_strict':
kernel/seccomp.c:526:3: error: implicit declaration of function
				'get_compat_mode1_syscalls'

The retrieval of compat syscall numbers were moved into inline function
defined in asm-generic header but the asm-generic header is not being
used by s390.

[heiko.carstens@de.ibm.com]: even though the build error will trigger
only in the next merge window it makes sense to include the generic
header file already now.

Fixes: ("seccomp: Get compat syscalls from asm-generic header")
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Signed-off-by: NSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

4f375903

s390/pci: add extra padding to function measurement block · 9d89d9e6

由 Sebastian Ott 提交于 3月 31, 2016

Newer machines might use a different (larger) format for function
measurement blocks. To ensure that we comply with the alignment
requirement on these machines and prevent memory corruption (when
firmware writes more data than we expect) add 16 padding bytes
at the end of the fmb.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

9d89d9e6

s390: wire up preadv2/pwritev2 syscalls · 3358999a

由 Heiko Carstens 提交于 3月 21, 2016

Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

3358999a

s390/pci: PCI function group 0 is valid for clp_query_pci_fn · aa624886

由 Pierre Morel 提交于 3月 16, 2016

The PCI function group 0 is a valid function group,
it is wrong to reject it.

Let's accept PCI function group 0.
Signed-off-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Acked-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

aa624886

29 3月, 2016 1 次提交

s390/crypto: provide correct file mode at device register. · 74b2375e

由 Harald Freudenberger 提交于 3月 17, 2016

When the prng device driver calls misc_register() there is the possibility
to also provide the recommented file permissions. This fix now gives
useful values (0644) where previously just the default was used (resulting
in 0600 for the device file).
Signed-off-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

74b2375e

26 3月, 2016 1 次提交

arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections · be7635e7

由 Alexander Potapenko 提交于 3月 25, 2016

KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>.  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be7635e7

23 3月, 2016 1 次提交

s390/extable: use generic search and sort routines · c352e8b6

由 Ard Biesheuvel 提交于 3月 22, 2016

Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c352e8b6

18 3月, 2016 1 次提交

param: convert some "on"/"off" users to strtobool · 4cc7ecb7

由 Kees Cook 提交于 3月 17, 2016

This changes several users of manual "on"/"off" parsing to use
strtobool.

Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4cc7ecb7

17 3月, 2016 5 次提交

s390/mm: handle PTE-mapped tail pages in fast gup · fc897c95

由 Gerald Schaefer 提交于 3月 17, 2016

With the THP refcounting rework it is possible to see THP compound tail
pages mapped with PTEs during a THP split. This needs to be considered
when using page_cache_get_speculative(), which will always fail on tail
pages because ->_count is always zero. commit 7aef4172 "mm: handle
PTE-mapped tail pages in gerneric fast gup implementaiton" fixed it for
the generic fast gup code by using compound_head(page) instead of page,
but not for s390.

This patch is a 1:1 adaption of commit 7aef4172 for the s390 fast gup
code. Without this fix, gup will fall back to the slow path or fail
in the unlikely scenario that we hit a THP under splitting in-between
the page table split and the compound page split.

Cc: stable@vger.kernel.org # v4.5
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fc897c95

s390: add DEBUG_RODATA support · 91d37211

由 Heiko Carstens 提交于 3月 17, 2016

git commit d2aa1aca ("mm/init: Add 'rodata=off' boot cmdline
parameter to disable read-only kernel mappings") adds a bogus warning
to the console which states that s390 does not support kernel memory
protection.

This however is not true. We do support that since a couple of years
however in a different way than the author of the above named patch
expected.

To get rid of the misleading message implement the mark_rodata_ro
function and emit a message which states the amount of memory which
was write protected already earlier.

This is the same what parisc currently does.

We currently do not support the kernel parameter "rodata=off" which
would allow to write to the rodata section again. However since we
have this feature since years without any problems there is no reason
to add support for this.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

91d37211

s390: disable postinit-readonly for now · df9ceff9

由 Kees Cook 提交于 3月 17, 2016

This is a temporary fix to let lkdtm run again on s390, though it'll
still fail the ro_after_init tests. Until rodata and ro_after_init
sections can be split on s390, disable special handling of ro_after_init.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

df9ceff9

s390/cpum_sf: Fix cpu hotplug notifier transitions · 1e3c1dd1

由 Anna-Maria Gleixner 提交于 3月 11, 2016

The cpumf_pmu_notfier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the PMC
of the CPU is not setup again. Furthermore the CPU_ONLINE_FROZEN case
will never be processed because of masking the switch expression with
CPU_TASKS_FROZEN.

Add handling for CPU_DOWN_FAILED transition to setup the PMC of the
CPU. Remove CPU_ONLINE_FROZEN case.
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1e3c1dd1

s390/cpum_cf: Fix missing cpu hotplug notifier transition · a9878842

由 Anna-Maria Gleixner 提交于 3月 11, 2016

The cpumf_pmu_notfier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the PMC
of the CPU is not setup again.

Add handling for CPU_DOWN_FAILED transition to setup the PMC of the
CPU.
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

a9878842

16 3月, 2016 1 次提交

s390: query dynamic DEBUG_PAGEALLOC setting · 10917b83

由 Christian Borntraeger 提交于 3月 15, 2016

We can use debug_pagealloc_enabled() to check if we can map the identity
mapping with 1MB/2GB pages as well as to print the current setting in
dump_stack.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10917b83

14 3月, 2016 2 次提交

s390/pci: enforce fmb page boundary rule · 80c544de

由 Sebastian Ott 提交于 3月 14, 2016

The function measurement block must not cross a page boundary. Ensure
that by raising the alignment requirement to the smallest power of 2
larger than the size of the fmb.

Fixes: d0b08853 ("s390/pci: performance statistics and debug infrastructure")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

80c544de

ipv4: Update parameters for csum_tcpudp_magic to their original types · 01cfbad7

由 Alexander Duyck 提交于 3月 11, 2016

This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs.  For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value.  The length is usually populated based
on skb->len which is an unsigned integer.

This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.

With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.

I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.

I also updated a few spots in terms of whitespace and type differences for
the addresses.  Most of this was just to make sure all of the definitions
were in sync going forward.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01cfbad7

10 3月, 2016 3 次提交

s390: fix floating pointer register corruption (again) · e370e476

由 Martin Schwidefsky 提交于 3月 10, 2016

There is a tricky interaction between the machine check handler
and the critical sections of load_fpu_regs and save_fpu_regs
functions. If the machine check interrupts one of the two
functions the critical section cleanup will complete the function
before the machine check handler s390_do_machine_check is called.
Trouble is that the machine check handler needs to validate the
floating point registers *before* and not *after* the completion
of load_fpu_regs/save_fpu_regs.

The simplest solution is to rewind the PSW to the start of the
load_fpu_regs/save_fpu_regs and retry the function after the
return from the machine check handler.
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org> # 4.3+
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e370e476

s390/cpumf: add missing lpp magic initialization · 8f100bb1

由 Heiko Carstens 提交于 3月 10, 2016

Add the missing lpp magic initialization for cpu 0. Without this all
samples on cpu 0 do not have the most significant bit set in the
program parameter field, which we use to distinguish between guest and
host samples if the pid is also 0.

We did initialize the lpp magic in the absolute zero lowcore but
forgot that when switching to the allocated lowcore on cpu 0 only.
Reported-by: NShu Juan Zhang <zhshuj@cn.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: e22cf8ca ("s390/cpumf: rework program parameter setting to detect guest samples")
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

8f100bb1

s390/mm: four page table levels vs. fork · 3446c13b

由 Martin Schwidefsky 提交于 2月 15, 2016

The fork of a process with four page table levels is broken since
git commit 6252d702 "[S390] dynamic page tables."

All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.

The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.

This fixes CVE-2016-2143.
Reported-by: NMarcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

3446c13b

09 3月, 2016 2 次提交

PCI: Include pci/hotplug Kconfig directly from pci/Kconfig · e7e127e3

由 Bjorn Helgaas 提交于 3月 08, 2016

Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/hotplug/Kconfig.

Note that this effectively adds pci/hotplug/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/hotplug/Kconfig:

  alpha
  arm
  avr32
  frv
  m68k
  microblaze
  mn10300
  sparc
  unicore32

Inspired-by-patch-from: Bogicevic Sasa <brutallesale@gmail.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

e7e127e3

PCI: Include pci/pcie/Kconfig directly from pci/Kconfig · 5f8fc432

由 Bogicevic Sasa 提交于 2月 03, 2016

Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/pcie/Kconfig.

Note that this effectively adds pci/pcie/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/pcie/Kconfig:

  alpha
  avr32
  blackfin
  frv
  m32r
  m68k
  microblaze
  mn10300
  parisc
  sparc
  unicore32
  xtensa

[bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace]
Signed-off-by: NSasa Bogicevic <brutallesale@gmail.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

5f8fc432

08 3月, 2016 12 次提交

s390: Fix misspellings in comments · 7eb792bf

由 Adam Buchbinder 提交于 3月 04, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7eb792bf

s390/mm: split arch/s390/mm/pgtable.c · 1e133ab2

由 Martin Schwidefsky 提交于 3月 08, 2016

The pgtable.c file is quite big, before it grows any larger split it
into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related
header definitions into the new gmap.h header and all of the pgste
helpers from pgtable.h to pgtable.c.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1e133ab2

s390/mm: uninline pmdp_xxx functions from pgtable.h · 227be799

由 Martin Schwidefsky 提交于 3月 08, 2016

The pmdp_xxx function are smaller than their ptep_xxx counterparts
but to keep things symmetrical unline them as well.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

227be799

s390/mm: uninline ptep_xxx functions from pgtable.h · ebde765c

由 Martin Schwidefsky 提交于 3月 08, 2016

The code in the various ptep_xxx functions has grown quite large,
consolidate them to four out-of-line functions:
ptep_xchg_direct to exchange a pte with another with immediate flushing
ptep_xchg_lazy to exchange a pte with another in a batched update
ptep_modify_prot_start to begin a protection flags update
ptep_modify_prot_commit to commit a protection flags update
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ebde765c

KVM: s390: allocate only one DMA page per VM · c54f0d6a

由 David Hildenbrand 提交于 12月 02, 2015

We can fit the 2k for the STFLE interpretation and the crypto
control block into one DMA page. As we now only have to allocate
one DMA page, we can clean up the code a bit.

As a nice side effect, this also fixes a problem with crycbd alignment in
case special allocation debug options are enabled, debugged by Sascha
Silbe.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c54f0d6a

KVM: s390: enable STFLE interpretation only if enabled for the guest · 80bc79dc

由 David Hildenbrand 提交于 12月 02, 2015

Not setting the facility list designation disables STFLE interpretation,
this is what we want if the guest was told to not have it.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

80bc79dc

KVM: s390: wake up when the VCPU cpu timer expires · b3c17f10

由 David Hildenbrand 提交于 2月 22, 2016

When the VCPU cpu timer expires, we have to wake up just like when the ckc
triggers. For now, setting up a cpu timer in the guest and going into
enabled wait will never lead to a wakeup. This patch fixes this problem.
Just as for the ckc, we have to take care of waking up too early. We
have to recalculate the sleep time and go back to sleep.

Please note that the timer callback calls kvm_s390_get_cpu_timer() from
interrupt context. As the timer is canceled when leaving handle_wait(),
and we don't do any VCPU cpu timer writes/updates in that function, we can
be sure that we will never try to read the VCPU cpu timer from the same cpu
that is currentyl updating the timer (deadlock).
Reported-by: NSascha Silbe <silbe@linux.vnet.ibm.com>
Tested-by: NSascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b3c17f10

KVM: s390: step the VCPU timer while in enabled wait · 5ebda316

由 David Hildenbrand 提交于 2月 22, 2016

The cpu timer is a mean to measure task execution time. We want
to account everything for a VCPU for which it is responsible. Therefore,
if the VCPU wants to sleep, it shall be accounted for it.

We can easily get this done by not disabling cpu timer accounting when
scheduled out while sleeping because of enabled wait.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

5ebda316

KVM: s390: protect VCPU cpu timer with a seqcount · 9c23a131

由 David Hildenbrand 提交于 2月 17, 2016

For now, only the owning VCPU thread (that has loaded the VCPU) can get a
consistent cpu timer value when calculating the delta. However, other
threads might also be interested in a more recent, consistent value. Of
special interest will be the timer callback of a VCPU that executes without
having the VCPU loaded and could run in parallel with the VCPU thread.

The cpu timer has a nice property: it is only updated by the owning VCPU
thread. And speaking about accounting, a consistent value can only be
calculated by looking at cputm_start and the cpu timer itself in
one shot, otherwise the result might be wrong.

As we only have one writing thread at a time (owning VCPU thread), we can
use a seqcount instead of a seqlock and retry if the VCPU refreshed its
cpu timer. This avoids any heavy locking and only introduces a counter
update/check plus a handful of smp_wmb().

The owning VCPU thread should never have to retry on reads, and also for
other threads this might be a very rare scenario.

Please note that we have to use the raw_* variants for locking the seqcount
as lockdep will produce false warnings otherwise. The rq->lock held during
vcpu_load/put is also acquired from hardirq context. Lockdep cannot know
that we avoid potential deadlocks by disabling preemption and thereby
disable concurrent write locking attempts (via vcpu_put/load).
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9c23a131

KVM: s390: step VCPU cpu timer during kvm_run ioctl · db0758b2

由 David Hildenbrand 提交于 2月 15, 2016

Architecturally we should only provide steal time if we are scheduled
away, and not if the host interprets a guest exit. We have to step
the guest CPU timer in these cases.

In the first shot, we will step the VCPU timer only during the kvm_run
ioctl. Therefore all time spent e.g. in interception handlers or on irq
delivery will be accounted for that VCPU.

We have to take care of a few special cases:
- Other VCPUs can test for pending irqs. We can only report a consistent
  value for the VCPU thread itself when adding the delta.
- We have to take care of STP sync, therefore we have to extend
  kvm_clock_sync() and disable preemption accordingly
- During any call to disable/enable/start/stop we could get premeempted
  and therefore get start/stop calls. Therefore we have to make sure we
  don't get into an inconsistent state.

Whenever a VCPU is scheduled out, sleeping, in user space or just about
to enter the SIE, the guest cpu timer isn't stepped.

Please note that all primitives are prepared to be called from both
environments (cpu timer accounting enabled or not), although not completely
used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called
while cpu timer accounting is enabled).
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

db0758b2

KVM: s390: abstract access to the VCPU cpu timer · 4287f247

由 David Hildenbrand 提交于 2月 15, 2016

We want to manually step the cpu timer in certain scenarios in the future.
Let's abstract any access to the cpu timer, so we can hide the complexity
internally.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4287f247

KVM: s390: store cpu id in vcpu->cpu when scheduled in · 01a745ac

由 David Hildenbrand 提交于 2月 12, 2016

By storing the cpu id, we have a way to verify if the current cpu is
owning a VCPU.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

01a745ac

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功