提交 · 4c7c44837be77e2689c577abef155c4b5d873c82 · openeuler / raspberrypi-kernel

04 4月, 2017 3 次提交

x86/mm: Define virtual memory map for 5-level paging · 4c7c4483

由 Kirill A. Shutemov 提交于 3月 30, 2017

The first part of memory map (up to %esp fixup) simply scales existing
map for 4-level paging by factor of 9 -- number of bits addressed by
the additional page table level.

The rest of the map is unchanged.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

4c7c4483

x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert · 361b4b58

由 Kirill A. Shutemov 提交于 3月 30, 2017

We don't need the assert anymore, as:

  17be0aec ("x86/asm/entry/64: Implement better check for canonical addresses")

made canonical address checks generic wrt. address width.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

361b4b58

x86/boot: Detect 5-level paging support · 3677d4c6

由 Kirill A. Shutemov 提交于 3月 30, 2017

In this initial implementation we force-require 5-level paging support
from the hardware, when compiled with CONFIG_X86_5LEVEL=y. (The kernel
will panic during boot on CPUs that don't support 5-level paging.)

We will implement boot-time switch between 4- and 5-level paging later.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3677d4c6

03 4月, 2017 2 次提交

x86/mm/numa: Remove numa_nodemask_from_meminfo() · 474aeffd

由 Wei Yang 提交于 3月 14, 2017

numa_nodemask_from_meminfo() generates a nodemask of nodes which have
memory according to a meminfo descriptor.

The two callsites of that function both set bits in copies of the
numa_nodes_parsed nodemask. In both cases, the information in supplied
numa_meminfo is a subset of numa_nodes_parsed. So setting those bits
again is not really necessary.

Here are the three call paths which show that the supplied numa_meminfo
argument describes memory regions in nodes which are already in
numa_nodes_parsed:

    x86_numa_init()
        numa_init()
            Case 1:
            acpi_numa_init()
	    acpi_parse_memory_affinity()
                    numa_add_memblk()
                    node_set(numa_nodes_parsed)
                acpi_parse_slit()
		 acpi_numa_slit_init()
		  numa_set_distance()
		   numa_alloc_distance()
                    numa_nodemask_from_meminfo()

            Case 2:
            amd_numa_init()
                numa_add_memblk()
                node_set(numa_nodes_parsed)

            Case 3
            dummy_numa_init()
                node_set(numa_nodes_parsed)
                numa_add_memblk()

            numa_register_memblks()
                numa_nodemask_from_meminfo()

Thus, in all three cases, the respective bit in numa_nodes_parsed is
set, which means it is not necessary to set it again in a copy of
numa_nodes_parsed.

So remove that function.
Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20170314030801.13656-2-richard.weiyang@gmail.com
[ Heavily massage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

474aeffd

x86/mm/numa: Improve alloc_node_data() error path message · 43dac8f6

由 Wei Yang 提交于 3月 14, 2017

alloc_node_data() tries to allocate from the local node first and, if
that attempt fails, falls back to any node. Improve the error message to
issue the initial node for ease during debugging.

Fix a typo in the comments, while at it.
Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
Link: http://lkml.kernel.org/r/20170314030801.13656-1-richard.weiyang@gmail.com
[ Masssage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

43dac8f6

01 4月, 2017 2 次提交

kasan: do not sanitize kexec purgatory · 13a6798e

由 Mike Galbraith 提交于 3月 31, 2017

Fixes this:

  kexec: Undefined symbol: __asan_load8_noabort
  kexec-bzImage64: Loading purgatory failed

Link: http://lkml.kernel.org/r/1489672155.4458.7.camel@gmx.deSigned-off-by: NMike Galbraith <efault@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13a6798e

mm: fix section name for .data..ro_after_init · 906f2a51

由 Kees Cook 提交于 3月 31, 2017

A section name for .data..ro_after_init was added by both:

    commit d07a980c ("s390: add proper __ro_after_init support")

and

    commit d7c19b06 ("mm: kmemleak: scan .data.ro_after_init")

The latter adds incorrect wrapping around the existing s390 section, and
came later.  I'd prefer the s390 naming, so this moves the s390-specific
name up to the asm-generic/sections.h and renames the section as used by
kmemleak (and in the future, kernel/extable.c).

Link: http://lkml.kernel.org/r/20170327192213.GA129375@beastSigned-off-by: NKees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390 parts]
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

906f2a51

31 3月, 2017 9 次提交

x86/mm: Make in_compat_syscall() work during exec · ada26481

由 Dmitry Safonov 提交于 3月 31, 2017

The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

On execve the registers of the task invoking exec() are copied to the child
pt_regs. So child->pt_regs->orig_ax contains the execve syscall number of the
parent.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32	  child->thread.status & TS_COMPAT
   x32	  child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64	  !ia32 && !x32 

child->thread.status is corretly set up in set_personality_*(), but the
syscall number in child->pt_regs.orig_ax is left unmodified.

Therefore the parent/child combinations work or fail in the following way:

Parent Child Child->thread_status  child->pt_regs.orig_ax  in_compat()  Works
ia64    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       Y
ia64    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 0     true        Y
ia64     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       N
ia32    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       Y
ia32    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 0     true        Y
ia32     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       N
 x32    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 1     true        N
 x32    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 1     true        Y
 x32     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 1     true        Y

Make set_personality_*() store the syscall number incl. __X32_SYSCALL_BIT
which corresponds to the newly started ELF executable in the childs
pt_regs, i.e. pretend that the exec was invoked from a task with the same
executable format.

So both thread.status and pt_regs.orig_ax correspond to the new ELF format
and in_compat_syscall() returns the correct result.

[ tglx: Rewrote changelog ]

Fixes: commit 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Reported-by: NAdam Borowski <kilobyte@angband.pl>
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
Cc: 0x7f454c46@gmail.com
Cc: linux-mm@kvack.org
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170331111137.28170-1-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ada26481

x86/boot: Include missing header file · 6b1cc946

由 Zhengyi Shen 提交于 3月 29, 2017

Sparse complains about missing forward declarations:

arch/x86/boot/compressed/error.c:8:6:
	warning: symbol 'warn' was not declared. Should it be static?
arch/x86/boot/compressed/error.c:15:6:
	warning: symbol 'error' was not declared. Should it be static?

Include the missing header file.
Signed-off-by: NZhengyi Shen <shenzhengyi@gmail.com>
Acked-by: NKess Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

6b1cc946

x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs · 29f72ce3

由 Yazen Ghannam 提交于 3月 30, 2017

MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name.
However, MCA bank 3 is defined on Fam17h systems and can be accessed
using legacy MSRs. Without a name we get a stack trace on Fam17h systems
when trying to register sysfs files for bank 3 on kernels that don't
recognize Scalable MCA.

Call MCA bank 3 "decode_unit" since this is what it represents on
Fam17h. This will allow kernels without SMCA support to see this bank on
Fam17h+ and prevent the stack trace. This will not affect older systems
since this bank is reserved on them, i.e. it'll be ignored.

Tested on AMD Fam15h and Fam17h systems.

  WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal
  kobject: (ffff88085bb256c0): attempted to be registered with empty name!
  ...
  Call Trace:
   kobject_add_internal
   kobject_add
   kobject_create_and_add
   threshold_create_device
   threshold_init_device
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

29f72ce3

x86/boot/32: Flip the logic in test_wp_bit() · 952a6c2c

由 Borislav Petkov 提交于 3月 30, 2017

... to have a natural "likely()" in the code flow and thus have the
success case with a branch 99.999% of the times non-taken and function
return code following it instead of jumping to it each time.

This puts the panic() call at the end of the function - it is going to
be practically unreachable anyway.

The C code is a bit more readable too.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20170330080101.ywsf5rg6ilzu4itk@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

952a6c2c

ARC: fix build warnings with !CONFIG_KPROBES · 4c6fabda

由 Vineet Gupta 提交于 3月 30, 2017

|   CC      lib/nmi_backtrace.o
| In file included from ../include/linux/kprobes.h:43:0,
|                  from ../lib/nmi_backtrace.c:17:
| ../arch/arc/include/asm/kprobes.h:57:13: warning: 'trap_is_kprobe' defined but not used [-Wunused-function]
|  static void trap_is_kprobe(unsigned long address, struct pt_regs *regs)
|              ^~~~~~~~~~~~~~

The warning started with 7d134b2c ("kprobes: move kprobe declarations
to asm-generic/kprobes.h") which started including <asm/kprobes.h>
unconditionally into <linux/kprobes.h> exposing a stub function for
!CONFIG_KPROBES to rest of world. Fix that by making the stub a macro
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

4c6fabda

ARCv2: SLC: Make sure busy bit is set properly on SLC flushing · c70c4733

由 Alexey Brodkin 提交于 3月 29, 2017

As reported in STAR 9001165532, an SLC control reg read (for checking
busy state) right after SLC invalidate command may incorrectly return
NOT busy causing software to NOT spin-wait while operation is underway.
(and for some reason this only happens if L1 cache is also disabled - as
required by IOC programming model)

Suggested workaround is to do an additional Control Reg read, which
ensures the 2nd read gets the right status.

Cc: stable@vger.kernel.org  #4.10
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
[vgupta: reworte changelog a bit]
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

c70c4733

arm64: drop non-existing vdso-offsets.h from .gitignore · 9b3403ae

由 Masahiro Yamada 提交于 3月 31, 2017

Since commit a66649da ("arm64: fix vdso-offsets.h dependency"),
include/generated/vdso-offsets.h is directly generated without
arch/arm64/kernel/vdso/vdso-offsets.h.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

9b3403ae

arm64: remove redundant header file in current.h · 34d04f25

由 Shaokun Zhang 提交于 3月 30, 2017

Commint 9d84fb27 ("arm64: restore get_current() optimisation") has
removed read_sysreg() and asm/sysreg.h is redundant.

This patch removes asm/sysreg.h header file.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

34d04f25

arm64: fix NULL dereference in have_cpu_die() · 335d2c2d

由 Mark Salter 提交于 3月 24, 2017

Commit 5c492c3f ("arm64: smp: Add function to determine if cpus are
stuck in the kernel") added a helper function to determine if die() is
supported in cpu_ops. This function assumes a cpu will have a valid
cpu_ops entry, but that may not be the case for cpu0 is spin-table or
parking protocol is used to boot secondary cpus. In that case, there
is a NULL dereference if have_cpu_die() is called by cpu0. So add a
check for a valid cpu_ops before dereferencing it.

Fixes: 5c492c3f ("arm64: smp: Add function to determine if cpus are stuck in the kernel")
Signed-off-by: NMark Salter <msalter@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

335d2c2d

30 3月, 2017 6 次提交

x86/build: Mostly disable '-maccumulate-outgoing-args' · 3f135e57

由 Josh Poimboeuf 提交于 3月 16, 2017

The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of GCC, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>

3f135e57

x86/boot/32: Rewrite test_wp_bit() · 4af17110

由 Andy Lutomirski 提交于 3月 29, 2017

This code seems to be very old and has gotten only minor updates.
It's overcomplicated and has a bunch of comments that are, at best,
of purely historical interest.  Nowadays we have a shiny function
probe_kernel_write() that does more or less exactly what we need.
Use it.

I switched the page that we test from swapper_pg_dir to
empty_zero_page because writing zero to empty_zero_page is more
obviously safe than writing to the paging structures.  (It's
extremely unlikely that any of this would cause problems in practice
because the write will fail on any supported CPU.)
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b9e64ab0236de30e7572213cea77bf95ae2e990.1490831211.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4af17110

x86/dump_pagetables: Add support for 5-level paging · fdd3d8ce

由 Kirill A. Shutemov 提交于 3月 28, 2017

Simple extension to support one more page table level.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170328104806.41711-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

fdd3d8ce

parisc: Avoid stalled CPU warnings after system shutdown · 476e75a4

由 Helge Deller 提交于 3月 29, 2017

Commit 73580dac ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function.  But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.

Fixes: 73580dac ("parisc: Fix system shutdown halt")
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+

476e75a4

parisc: Clean up fixup routines for get_user()/put_user() · d19f5e41

由 Helge Deller 提交于 3月 25, 2017

Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.

This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace.  Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.

This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
   helps the compiler register allocator and thus creates less assembler
   statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: NHelge Deller <deller@gmx.de>

d19f5e41

parisc: Fix access fault handling in pa_memcpy() · 554bfece

由 Helge Deller 提交于 3月 29, 2017

pa_memcpy() is the major memcpy implementation in the parisc kernel which is
used to do any kind of userspace/kernel memory copies.

Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably
that in case of faults it may report back to have copied more bytes than it
actually did.

Fixing those bugs is quite hard in the C-implementation, because the compiler
is messing around with the registers and we are not guaranteed that specific
variables are always in the same processor registers. This makes proper fault
handling complicated.

This patch implements pa_memcpy() in assembler. That way we have correct fault
handling and adding a 64-bit copy routine was quite easy.

Runtime tested with 32- and 64bit kernels.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Signed-off-by: NHelge Deller <deller@gmx.de>

554bfece

29 3月, 2017 7 次提交

sparc/ptrace: Preserve previous registers for short regset write · d3805c54

由 Dave Martin 提交于 3月 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3805c54

mips/ptrace: Preserve previous registers for short regset write · d614fd58

由 Dave Martin 提交于 3月 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d614fd58

metag/ptrace: Reject partial NT_METAG_RPIPE writes · 7195ee31

由 Dave Martin 提交于 3月 27, 2017

It's not clear what behaviour is sensible when doing partial write of
NT_METAG_RPIPE, so just don't bother.

This patch assumes that userspace will never rely on a partial SETREGSET
in this case, since it's not clear what should happen anyway.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Acked-by: NJames Hogan <james.hogan@imgtec.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7195ee31

metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS · 5fe81fe9

由 Dave Martin 提交于 3月 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.
Suggested-by: NJames Hogan <james.hogan@imgtec.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fe81fe9

metag/ptrace: Preserve previous registers for short regset write · a78ce80d

由 Dave Martin 提交于 3月 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Acked-by: NJames Hogan <james.hogan@imgtec.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a78ce80d

h8300/ptrace: Fix incorrect register transfer count · 502585c7

由 Dave Martin 提交于 3月 27, 2017

regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.

So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

502585c7

c6x/ptrace: Remove useless PTRACE_SETREGSET implementation · fb411b83

由 Dave Martin 提交于 3月 27, 2017

gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.

So, just remove it.  The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb411b83

28 3月, 2017 2 次提交

KVM: x86: cleanup the page tracking SRCU instance · 2beb6dad

由 Paolo Bonzini 提交于 3月 27, 2017

SRCU uses a delayed work item.  Skip cleaning it up, and
the result is use-after-free in the work item callbacks.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Suggested-by: NDmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 0eb05bf2Reviewed-by: NXiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2beb6dad

KVM: nVMX: fix nested EPT detection · 7ad658b6

由 Ladi Prosek 提交于 3月 23, 2017

The nested_ept_enabled flag introduced in commit 7ca29de2 was not
computed correctly. We are interested only in L1's EPT state, not the
the combined L0+L1 value.

In particular, if L0 uses EPT but L1 does not, nested_ept_enabled must
be false to make sure that PDPSTRs are loaded based on CR3 as usual,
because the special case described in 26.3.2.4 Loading Page-Directory-
Pointer-Table Entries does not apply.

Fixes: 7ca29de2 ("KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT")
Cc: qemu-stable@nongnu.org
Reported-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ad658b6

27 3月, 2017 6 次提交

x86: Convert the rest of the code to support p4d_t · f2a6a705

由 Kirill A. Shutemov 提交于 3月 17, 2017

This patch converts x86 to use proper folding of a new (fifth) page table level
with <asm-generic/pgtable-nop4d.h>.

That's a bit of a kitchen sink patch, but I don't see how to split it further
without hurting bisectability.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-7-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f2a6a705

x86/xen: Change __xen_pgd_walk() and xen_cleanmfnmap() to support p4d · 907cd439

由 Xiong Zhang 提交于 3月 17, 2017

Split these helpers into a couple of per-level functions and add support for
an additional page table level.
Signed-off-by: NXiong Zhang <xiong.y.zhang@intel.com>
[ Split off into separate patch ]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-6-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

907cd439

x86/kasan: Prepare clear_pgds() to switch to <asm-generic/pgtable-nop4d.h> · d691a3cf

由 Kirill A. Shutemov 提交于 3月 17, 2017

With folded p4d, pgd_clear() is a nop. Change clear_pgds() to use
p4d_clear() instead.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-5-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d691a3cf

x86/mm/pat: Add 5-level paging support · 45478336

由 Kirill A. Shutemov 提交于 3月 17, 2017

Straight-forward extension of existing code to support additional page
table level.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

45478336

x86/efi: Add 5-level paging support · e981316f

由 Kirill A. Shutemov 提交于 3月 17, 2017

Allocate additional page table level and ajdust efi_sync_low_kernel_mappings()
to work with additional page table level.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e981316f

x86/kexec: Add 5-level paging support · 7f689041

由 Kirill A. Shutemov 提交于 3月 17, 2017

Handle additional page table level in the kexec code.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f689041

25 3月, 2017 1 次提交

ARC: vdk: Fix support of UIO · ae9955ae

由 Alexey Brodkin 提交于 3月 23, 2017

MotherBoard section has its "ranges" set to 0xE000_0000-0xF000_0000.
But UIO node maps 4 different areas in different memory locations
and all outside MB's ranges.

That obviously breaks UIO mappings in runtime.

Cc: Ruud Derwig <rderwig@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

ae9955ae

24 3月, 2017 2 次提交

x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization · a46f60d7

由 Baoquan He 提交于 3月 24, 2017

Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.
Signed-off-by: NBaoquan He <bhe@redhat.com>
Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
Acked-by: NDave Young <dyoung@redhat.com>
Acked-by: NThomas Garnier <thgarnie@google.com>
Cc: <stable@vger.kernel.org> #4.8+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a46f60d7

KVM: VMX: Fix enable VPID conditions · 08d839c4

由 Wanpeng Li 提交于 3月 23, 2017

This can be reproduced by running L2 on L1, and disable VPID on L0
if w/o commit "KVM: nVMX: Fix nested VPID vmx exec control", the L2
crash as below:

KVM: entry failed, hardware error 0x7
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000

Reference SDM 30.3 INVVPID:

Protected Mode Exceptions
- #UD
  - If not in VMX operation.
  - If the logical processor does not support VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=0).
  - If the logical processor supports VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=1) but does
    not support the INVVPID instruction (IA32_VMX_EPT_VPID_CAP[32]=0).

So we should check both VPID enable bit in vmx exec control and INVVPID support bit
in vmx capability MSRs to enable VPID. This patch adds the guarantee to not enable
VPID if either INVVPID or single-context/all-context invalidation is not exposed in
vmx capability MSRs.
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08d839c4