提交 · dfa9a942fd7951c8f333cf3f377dde51ebd21685 · openeuler / raspberrypi-kernel

15 7月, 2016 5 次提交

x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct · dfa9a942

由 Andy Lutomirski 提交于 7月 14, 2016

struct thread_info is a legacy mess.  To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

dfa9a942

x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm · 46aea387

由 Andy Lutomirski 提交于 7月 14, 2016

If we get a vmalloc fault while current->active_mm->pgd doesn't
match CR3, we'll crash without this change.  I've seen this failure
mode on heavily instrumented kernels with virtually mapped stacks.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4650d7674185f165ed8fdf9ac4c5c35c5c179ba8.1468527351.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

46aea387

x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables() · d92fc69c

由 Andy Lutomirski 提交于 7月 14, 2016

kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.

Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions.  This leaves a couple of
other helpers unused, so delete them, too.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

d92fc69c

x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated · 360cb4d1

由 Andy Lutomirski 提交于 7月 14, 2016

This avoids pointless races in which another CPU or task might see a
partially populated global PGD entry.  These races should normally
be harmless, but, if another CPU propagates the entry via
vmalloc_fault() and then populate_pgd() fails (due to memory allocation
failure, for example), this prevents a use-after-free of the PGD
entry.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bf99df27eac6835f687005364bd1fbd89130946c.1468527351.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

360cb4d1

x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() · af2cf278

由 Ingo Molnar 提交于 7月 14, 2016

So when memory hotplug removes a piece of physical memory from pagetable
mappings, it also frees the underlying PGD entry.

This complicates PGD management, so don't do this. We can keep the
PGD mapped and the PUD table all clear - it's only a single 4K page
per 512 GB of memory hotplugged.
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/064ff6c7275734537f969e876f6cd0baa954d2cc.1468527351.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

af2cf278

13 7月, 2016 1 次提交

x86/mm: Use pte_none() to test for empty PTE · dcb32d99

由 Dave Hansen 提交于 7月 07, 2016

The page table manipulation code seems to have grown a couple of
sites that are looking for empty PTEs.  Just in case one of these
entries got a stray bit set, use pte_none() instead of checking
for a zero pte_val().

The use pte_same() makes me a bit nervous.  If we were doing a
pte_same() check against two cleared entries and one of them had
a stray bit set, it might fail the pte_same() check.  But, I
don't think we ever _do_ pte_same() for cleared entries.  It is
almost entirely used for checking for races in fault-in paths.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

dcb32d99

08 7月, 2016 1 次提交

x86/mm/pat, /dev/mem: Remove superfluous error message · 39380b80

由 Jiri Kosina 提交于 7月 08, 2016

Currently it's possible for broken (or malicious) userspace to flood a
kernel log indefinitely with messages a-la

	Program dmidecode tried to access /dev/mem between f0000->100000

because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
dumps this information each and every time devmem_is_allowed() fails.

Reportedly userspace that is able to trigger contignuous flow of these
messages exists.

It would be possible to rate limit this message, but that'd have a
questionable value; the administrator wouldn't get information about all
the failing accessess, so then the information would be both superfluous
and incomplete at the same time :)

Returning EPERM (which is what is actually happening) is enough indication
for userspace what has happened; no need to log this particular error as
some sort of special condition.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pmSigned-off-by: NIngo Molnar <mingo@kernel.org>

39380b80

25 6月, 2016 1 次提交

x86: get rid of superfluous __GFP_REPEAT · a3a9a59d

由 Michal Hocko 提交于 6月 24, 2016

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
flag is for more than order-0.  This means that this flag has never been
actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-3-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3a9a59d

20 5月, 2016 3 次提交

x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code · dc4fac84

由 Andy Lutomirski 提交于 5月 10, 2016

x86's page fault handlers had two TASK_SIZE uses that should have
been TASK_SIZE_MAX.  I don't think that either one had a visible
effect, but this makes the code clearer and should save a few bytes
of text.

(And I eventually want to eradicate TASK_SIZE.  This will help.)
Reported-by: NCyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ruslan Kabatsayev <b7.10110111@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1242fb23b0d05c3069dbf5758ac55d26bc114bef.1462914565.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

dc4fac84

x86: mm: use hugetlb_bad_size() · 2b18e532

由 Vaishali Thakkar 提交于 5月 19, 2016

Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.
Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b18e532

include/linux/nodemask.h: create next_node_in() helper · 0edaf86c

由 Andrew Morton 提交于 5月 19, 2016

Lots of code does

	node = next_node(node, XXX);
	if (node == MAX_NUMNODES)
		node = first_node(XXX);

so create next_node_in() to do this and use it in various places.

[mhocko@suse.com: use next_node_in() helper]
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Hui Zhu <zhuhui@xiaomi.com>
Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0edaf86c

10 5月, 2016 1 次提交

x86/boot: Add missing file header comments · cb18ef0d

由 Kees Cook 提交于 5月 09, 2016

There were some files with missing header comments. Since they are
included from both compressed and regular kernels, make note of that.
Also corrects a typo in the mem_avoid comments.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

cb18ef0d

07 5月, 2016 1 次提交

x86/boot: Split out kernel_ident_mapping_init() · cf4fb15b

由 Yinghai Lu 提交于 5月 06, 2016

In order to support on-demand page table creation when moving the
kernel for KASLR, we need to use kernel_ident_mapping_init() in the
decompression code.

This splits it out into its own file for use outside of init_64.c.
Additionally, checking for __pa/__va defines is added since they
need to be overridden in the decompression code.

[kees: rewrote changelog]
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

cf4fb15b

29 4月, 2016 1 次提交

x86/segments/64: When loadsegment(fs, ...) fails, clear the base · 45e876f7

由 Andy Lutomirski 提交于 4月 26, 2016

On AMD CPUs, a failed loadsegment currently may not clear the FS
base.  Fix it.

While we're at it, prevent loadsegment(gs, xyz) from even compiling
on 64-bit kernels.  It shouldn't be used.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a084c1b93b7b1408b58d3fd0b5d6e47da8e7d7cf.1461698311.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

45e876f7

28 4月, 2016 4 次提交

x86/mm, sched/core: Turn off IRQs in switch_mm() · 078194f8

由 Andy Lutomirski 提交于 4月 26, 2016

Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy.  AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.

Fix it with a big hammer: run switch_mm() with IRQs off.

To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

078194f8

x86/mm, sched/core: Uninline switch_mm() · 69c0319a

由 Andy Lutomirski 提交于 4月 26, 2016

It's fairly large and it has quite a few callers.  This may also
help untangle some headers down the road.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

69c0319a

x86/mm: Build arch/x86/mm/tlb.c even on !SMP · e1074888

由 Andy Lutomirski 提交于 4月 26, 2016

Currently all of the functions that live in tlb.c are inlined on
!SMP builds.  One can debate whether this is a good idea (in many
respects the code in tlb.c is better than the inlined UP code).

Regardless, I want to add code that needs to be built on UP and SMP
kernels and relates to tlb flushing, so arrange for tlb.c to be
compiled unconditionally.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e1074888

x86/mm/pat: Document the (currently) EFI-only code path · 7fc8442f

由 Matt Fleming 提交于 4月 25, 2016

It's not at all obvious that populate_pgd() and friends are only
executed when mapping EFI virtual memory regions or that no other
pageattr callers pass a ->pgd value.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-4-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>

7fc8442f

27 4月, 2016 1 次提交

Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging" · e16d8a6c

由 Andy Lutomirski 提交于 4月 26, 2016

This reverts commit 320d25b6.

This change was problematic for a couple of reasons:

1. It missed a some entry points (Xen things and 64-bit native).

2. The entry it changed can be executed more than once.  This isn't
   really a problem, but it conflated per-cpu state setup and global
   state setup.

3. It broke 64-bit non-NX.  64-bit non-NX worked the other way around from
   32-bit -- __supported_pte_mask had NX set initially and was *cleared*
   in x86_configure_nx.  With the patch applied, it never got cleared.
Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e16d8a6c

22 4月, 2016 1 次提交

x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET · e8581e3d

由 Baoquan He 提交于 4月 20, 2016

Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: NBaoquan He <bhe@redhat.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e8581e3d

13 4月, 2016 7 次提交

x86/extable: Add a comment about early exception handlers · 60a0e203

由 Andy Lutomirski 提交于 4月 04, 2016

Borislav asked for a comment explaining why all exception handlers are
allowed early.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/5f1dcd6919f4a5923959a8065cb2c04d9dac1412.1459784772.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

60a0e203

x86/msr: Carry on after a non-"safe" MSR access fails · fbd70437

由 Andy Lutomirski 提交于 4月 02, 2016

This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ONCE() and, for RDMSR, a return value of zero.

To be clear, this type of failure should *not* happen.  This patch
exists to minimize the chance of nasty undebuggable failures
happening when a CONFIG_PARAVIRT=y bug in the non-"safe" MSR helpers
gets fixed.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/26567b216aae70e795938f4b567eace5a0eb90ba.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

fbd70437

x86/traps: Enable all exception handler callbacks early · ae7ef45e

由 Andy Lutomirski 提交于 4月 02, 2016

Now that early_fixup_exception() has pt_regs, we can just call
fixup_exception() from it.  This will make fancy exception handlers
work early.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/20fc047d926150cb08cb9b9f2923519b07ec1a15.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

ae7ef45e

x86/head: Move early exception panic code into early_fixup_exception() · 0e861fbb

由 Andy Lutomirski 提交于 4月 02, 2016

This removes a bunch of assembly and adds some C code instead.  It
changes the actual printouts on both 32-bit and 64-bit kernels, but
they still seem okay.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4085070316fc3ab29538d3fcfe282648d1d4ee2e.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0e861fbb

x86/head: Move the early NMI fixup into C · 0d0efc07

由 Andy Lutomirski 提交于 4月 02, 2016

C is nicer than asm.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/dd068269f8d59fe44e9e43a50d0efd67da65c2b5.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0d0efc07

x86/head: Pass a real pt_regs and trapnr to early_fixup_exception() · 7bbcdb1c

由 Andy Lutomirski 提交于 4月 02, 2016

early_fixup_exception() is limited by the fact that it doesn't have a
real struct pt_regs.  Change both the 32-bit and 64-bit asm and the
C code to pass and accept a real pt_regs.
Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/e3fb680fcfd5e23e38237e8328b64a25cc121d37.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

7bbcdb1c

x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386 · 1886297c

由 Toshi Kani 提交于 4月 11, 2016

The following BUG_ON() crash was reported on QEMU/i386:

  kernel BUG at arch/x86/mm/physaddr.c:79!
  Call Trace:
  phys_mem_access_prot_allowed
  mmap_mem
  ? mmap_region
  mmap_region
  do_mmap
  vm_mmap_pgoff
  SyS_mmap_pgoff
  do_int80_syscall_32
  entry_INT80_32

after commit:

  edfe63ec ("x86/mtrr: Fix Xorg crashes in Qemu sessions")

PAT is now set to disabled state when MTRRs are disabled.
Thus, reactivating the __pa(high_memory) check in
phys_mem_access_prot_allowed().

When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(),
which in turn calls slow_virt_to_phys() for 'high_memory'.
Because 'high_memory' is set to (the max direct mapped virt
addr + 1), it is not a valid virtual address.  Hence,
slow_virt_to_phys() returns 0 and hit the BUG_ON.  Using
__pa_nodebug() instead of __pa() will fix this BUG_ON.

However, this code block, originally written for Pentiums and
earlier, is no longer adequate since a 32-bit Xen guest has
MTRRs disabled and supports ZONE_HIGHMEM.  In this setup,
this code sets UC attribute for accessing RAM in high memory
range.

Delete this code block as it has been unused for a long time.
Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1460403360-25441-1-git-send-email-toshi.kani@hpe.com
Link: https://lkml.org/lkml/2016/4/1/608Signed-off-by: NIngo Molnar <mingo@kernel.org>

1886297c

02 4月, 2016 2 次提交

mm/rmap: batched invalidations should use existing api · 858eaaa7

由 Nadav Amit 提交于 4月 01, 2016

The recently introduced batched invalidations mechanism uses its own
mechanism for shootdown.  However, it does wrong accounting of
interrupts (e.g., inc_irq_stat is called for local invalidations),
trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
may break some platforms as it bypasses the invalidation mechanisms of
Xen and SGI UV.

This patch reuses the existing TLB flushing mechnaisms instead.  We use
NULL as mm to indicate a global invalidation is required.

Fixes 72b252ae ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: NNadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

858eaaa7

x86/mm: TLB_REMOTE_SEND_IPI should count pages · 18c98243

由 Nadav Amit 提交于 4月 01, 2016

TLB_REMOTE_SEND_IPI was recently introduced, but it counts bytes instead
of pages.  In addition, it does not report correctly the case in which
flush_tlb_page flushes a page.  Fix it to be consistent with other TLB
counters.

Fixes: 5b74283a ("x86, mm: trace when an IPI is about to be sent")
Signed-off-by: NNadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18c98243

31 3月, 2016 4 次提交

x86/cpufeature: Remove cpu_has_pse · 16bf9226

由 Borislav Petkov 提交于 3月 29, 2016

Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-11-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

16bf9226

x86/cpufeature: Remove cpu_has_pge · c109bf95

由 Borislav Petkov 提交于 3月 29, 2016

Use static_cpu_has() in __flush_tlb_all() due to the time-sensitivity of
this one.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-10-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

c109bf95

x86/cpufeature: Remove cpu_has_clflush · 906bf7fd

由 Borislav Petkov 提交于 3月 29, 2016

Use the fast variant in the DRM code.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dri-devel@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1459266123-21878-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

906bf7fd

x86/cpufeature: Remove cpu_has_gbpages · b8291adc

由 Borislav Petkov 提交于 3月 29, 2016

Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

b8291adc

29 3月, 2016 4 次提交

x86/xen, pat: Remove PAT table init code from Xen · 88ba2811

由 Toshi Kani 提交于 3月 23, 2016

Xen supports PAT without MTRRs for its guests.  In order to
enable WC attribute, it was necessary for xen_start_kernel()
to call pat_init_cache_modes() to update PAT table before
starting guest kernel.

Now that the kernel initializes PAT table to the BIOS handoff
state when MTRR is disabled, this Xen-specific PAT init code
is no longer necessary.  Delete it from xen_start_kernel().

Also change __init_cache_modes() to a static function since
PAT table should not be tweaked by other modules.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJuergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

88ba2811

x86/mm/pat: Replace cpu_has_pat with boot_cpu_has() · d63dcf49

由 Toshi Kani 提交于 3月 23, 2016

Borislav Petkov suggested:

 > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast
 > paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX
 > ugliness.

Replace the use of cpu_has_pat on init paths with boot_cpu_has().
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-4-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d63dcf49

x86/mm/pat: Add pat_disable() interface · 224bb1e5

由 Toshi Kani 提交于 3月 23, 2016

In preparation for fixing a regression caused by:

  9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled")

... PAT needs to provide an interface that prevents the OS from
initializing the PAT MSR.

PAT MSR initialization must be done on all CPUs using the specific
sequence of operations defined in the Intel SDM.  This requires MTRRs
to be enabled since pat_init() is called as part of MTRR init
from mtrr_rendezvous_handler().

Make pat_disable() as the interface that prevents the OS from
initializing the PAT MSR.  MTRR will call this interface when it
cannot provide the SDM-defined sequence to initialize PAT.

This also assures that pat_disable() called from pat_bsp_init()
will set the PAT table properly when CPU does not support PAT.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

224bb1e5

x86/mm/pat: Add support of non-default PAT MSR setting · 02f037d6

由 Toshi Kani 提交于 3月 23, 2016

In preparation for fixing a regression caused by:

  9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled")'

... PAT needs to support a case that PAT MSR is initialized with a
non-default value.

When pat_init() is called and PAT is disabled, it initializes the
PAT table with the BIOS default value. Xen, however, sets PAT MSR
with a non-default value to enable WC. This causes inconsistency
between the PAT table and PAT MSR when PAT is set to disable on Xen.

Change pat_init() to handle the PAT disable cases properly.  Add
init_cache_modes() to handle two cases when PAT is set to disable.

 1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
 2. CPU does not support PAT: Set PAT table to be consistent with
    PWT and PCD bits in a PTE.

Note, __init_cache_modes(), renamed from pat_init_cache_modes(),
will be changed to a static function in a later patch.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

02f037d6

23 3月, 2016 2 次提交

x86/extable: use generic search and sort routines · 29934b0f

由 Ard Biesheuvel 提交于 3月 22, 2016

Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29934b0f

kernel: add kcov code coverage · 5c9a8750

由 Dmitry Vyukov 提交于 3月 22, 2016

kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c9a8750

18 3月, 2016 1 次提交

mm: introduce page reference manipulation functions · fe896d18

由 Joonsoo Kim 提交于 3月 17, 2016

The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count.  Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it.  Then, it is hard to find actual
reason of CMA allocation failure.  CMA allocation should be guaranteed
to succeed so finding offending place is really important.

In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function.  This is preparation step to
add tracepoint to each page reference manipulation function.  With this
facility, we can easily find reason of CMA allocation failure.  There is
no functional change in this patch.

In addition, this patch also converts reference read sites.  It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NMichal Nazarewicz <mina86@mina86.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe896d18