1. 30 6月, 2017 2 次提交
  2. 24 6月, 2017 1 次提交
    • M
      x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap · 4a06370b
      Michal Hocko 提交于
      Since the following commit in 2008:
      
        cc503c1b ("x86: PIE executable randomization")
      
      We added a heuristics to treat applications with RLIMIT_STACK configured
      to unlimited as legacy. This means:
      
       a) set the mmap_base to 1/3 of address space + randomization and
       b) mmap from bottom to top.
      
      This makes some sense as it allows the stack to grow really large. On the
      other hand it reduces the address space usable for default mmaps
      (without address hint) quite a lot.
      
      We have received a bug report that SAP HANA workload has hit into this
      limitation.
      
      We could argue that the user just got what he asked for when setting
      up the unlimited stack but to be realistic growing stack up to 1/6
      TASK_SIZE (allowed by mmap_base) is pretty much unimited in the real
      life. This would give mmap 20TB of additional address space which is
      quite nice. Especially when it is much more likely to use that address
      space than the reserved stack.
      
      Digging into the history the original implementation of the randomization:
      
        8817210d ("[PATCH] x86_64: Flexmap for 32bit and randomized mappings for 64bit")
      
      didn't have this restriction.
      
      So let's try and remove this assumption - hopefully nothing breaks.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: hughd@google.com
      Cc: linux-mm@kvack.org
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/tip-86b110d2ae6365ce91cabd37588bc8611770421a@git.kernel.org
      [ So I've applied this to tip:x86/mm with a wider Cc: list - if anyone objects to this change please holler. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4a06370b
  3. 22 6月, 2017 1 次提交
    • A
      x86/ldt: Simplify the LDT switching logic · 73534258
      Andy Lutomirski 提交于
      Originally, Linux reloaded the LDT whenever the prev mm or the next
      mm had an LDT. It was changed in 2002 in:
      
        0bbed3beb4f2 ("[PATCH] Thread-Local Storage (TLS) support")
      
      (commit from the historical tree), like this:
      
      -		/* load_LDT, if either the previous or next thread
      -		 * has a non-default LDT.
      +		/*
      +		 * load the LDT, if the LDT is different:
      		 */
      -		if (next->context.size+prev->context.size)
      +		if (unlikely(prev->context.ldt != next->context.ldt))
      			load_LDT(&next->context);
      
      The current code is unlikely to avoid any LDT reloads, since different
      mms won't share an LDT.
      
      When we redo lazy mode to stop flush IPIs without switching to
      init_mm, though, the current logic would become incorrect: it will
      be possible to have real_prev == next but nonetheless have a stale
      LDT descriptor.
      
      Simplify the code to update LDTR if either the previous or the next
      mm has an LDT, i.e. effectively restore the historical logic..
      While we're at it, clean up the code by moving all the ifdeffery to
      a header where it belongs.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/2a859ac01245f9594c58f9d0a8b2ed8a7cd2507e.1498022414.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      73534258
  4. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  5. 13 6月, 2017 8 次提交
  6. 05 6月, 2017 7 次提交
    • A
      x86/mm: Be more consistent wrt PAGE_SHIFT vs PAGE_SIZE in tlb flush code · be4ffc0d
      Andy Lutomirski 提交于
      Nadav pointed out that some code used PAGE_SIZE and other code used
      PAGE_SHIFT.  Use PAGE_SHIFT instead of multiplying or dividing by
      PAGE_SIZE.
      Requested-by: NNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      be4ffc0d
    • A
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski 提交于
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3d28ebce
    • A
      x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code · ce4a4e56
      Andy Lutomirski 提交于
      The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
      Aside from that, it's fallen quite a bit behind the SMP code:
      
       - flush_tlb_mm_range() didn't flush individual pages if the range
         was small.
      
       - The lazy TLB code was much weaker.  This usually wouldn't matter,
         but, if a kernel thread flushed its lazy "active_mm" more than
         once (due to reclaim or similar), it wouldn't be unlazied and
         would instead pointlessly flush repeatedly.
      
       - Tracepoints were missing.
      
      Aside from that, simply having the UP code around was a maintanence
      burden, since it means that any change to the TLB flush code had to
      make sure not to break it.
      
      Simplify everything by deleting the UP code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ce4a4e56
    • A
      x86/mm: Use new merged flush logic in arch_tlbbatch_flush() · 3f79e4c7
      Andy Lutomirski 提交于
      Now there's only one copy of the local tlb flush logic for
      non-kernel pages on SMP kernels.
      
      The only functional change is that arch_tlbbatch_flush() will now
      leave_mm() on the local CPU if that CPU is in the batch and is in
      TLBSTATE_LAZY mode.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3f79e4c7
    • A
      x86/mm: Refactor flush_tlb_mm_range() to merge local and remote cases · 454bbad9
      Andy Lutomirski 提交于
      The local flush path is very similar to the remote flush path.
      Merge them.
      
      This is intended to make no difference to behavior whatsoever.  It
      removes some code and will make future changes to the flushing
      mechanics simpler.
      
      This patch does remove one small optimization: flush_tlb_mm_range()
      now has an unconditional smp_mb() instead of using MOV to CR3 or
      INVLPG as a full barrier when applicable.  I think this is okay for
      a few reasons.  First, smp_mb() is quite cheap compared to the cost
      of a TLB flush.  Second, this rearrangement makes a bigger
      optimization available: with some work on the SMP function call
      code, we could do the local and remote flushes in parallel.  Third,
      I'm planning a rework of the TLB flush algorithm that will require
      an atomic operation at the beginning of each flush, and that
      operation will replace the smp_mb().
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      454bbad9
    • A
      x86/mm: Change the leave_mm() condition for local TLB flushes · 59f537c1
      Andy Lutomirski 提交于
      On a remote TLB flush, we leave_mm() if we're TLBSTATE_LAZY.  For a
      local flush_tlb_mm_range(), we leave_mm() if !current->mm.  These
      are approximately the same condition -- the scheduler sets lazy TLB
      mode when switching to a thread with no mm.
      
      I'm about to merge the local and remote flush code, but for ease of
      verifying and bisecting the patch, I want the local and remote flush
      behavior to match first.  This patch changes the local code to match
      the remote code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59f537c1
    • A
      x86/mm: Pass flush_tlb_info to flush_tlb_others() etc · a2055abe
      Andy Lutomirski 提交于
      Rather than passing all the contents of flush_tlb_info to
      flush_tlb_others(), pass a pointer to the structure directly. For
      consistency, this also removes the unnecessary cpu parameter from
      uv_flush_tlb_others() to make its signature match the other
      *flush_tlb_others() functions.
      
      This serves two purposes:
      
       - It will dramatically simplify future patches that change struct
         flush_tlb_info, which I'm planning to do.
      
       - struct flush_tlb_info is an adequate description of what to do
         for a local flush, too, so by reusing it we can remove duplicated
         code between local and remove flushes in a future patch.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      [ Fix build warning. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a2055abe
  7. 01 6月, 2017 1 次提交
    • I
      Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT" · c08d5174
      Ingo Molnar 提交于
      This reverts commit cbed27cd.
      
      As Andy Lutomirski observed:
      
       "I think this patch is bogus. pat_enabled() sure looks like it's
        supposed to return true if PAT is *enabled*, and these days PAT is
        'enabled' even if there's no HW PAT support."
      Reported-by: NBernhard Held <berny156@gmx.de>
      Reported-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org # v4.2+
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c08d5174
  8. 27 5月, 2017 1 次提交
    • S
      x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range() · a53276e2
      Steven Rostedt (VMware) 提交于
      With function tracing starting in early bootup and having its trampoline
      pages being read only, a bug triggered with the following:
      
      kernel BUG at arch/x86/mm/pageattr.c:189!
      invalid opcode: 0000 [#1] SMP
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc2-test+ #3
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      task: ffffffffb4222500 task.stack: ffffffffb4200000
      RIP: 0010:change_page_attr_set_clr+0x269/0x302
      RSP: 0000:ffffffffb4203c88 EFLAGS: 00010046
      RAX: 0000000000000046 RBX: 0000000000000000 RCX: 00000001b6000000
      RDX: ffffffffb4203d40 RSI: 0000000000000000 RDI: ffffffffb4240d60
      RBP: ffffffffb4203d18 R08: 00000001b6000000 R09: 0000000000000001
      R10: ffffffffb4203aa8 R11: 0000000000000003 R12: ffffffffc029b000
      R13: ffffffffb4203d40 R14: 0000000000000001 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff9a639ea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff9a636b384000 CR3: 00000001ea21d000 CR4: 00000000000406b0
      Call Trace:
       change_page_attr_clear+0x1f/0x21
       set_memory_ro+0x1e/0x20
       arch_ftrace_update_trampoline+0x207/0x21c
       ? ftrace_caller+0x64/0x64
       ? 0xffffffffc029b000
       ftrace_startup+0xf4/0x198
       register_ftrace_function+0x26/0x3c
       function_trace_init+0x5e/0x73
       tracer_init+0x1e/0x23
       tracing_set_tracer+0x127/0x15a
       register_tracer+0x19b/0x1bc
       init_function_trace+0x90/0x92
       early_trace_init+0x236/0x2b3
       start_kernel+0x200/0x3f5
       x86_64_start_reservations+0x29/0x2b
       x86_64_start_kernel+0x17c/0x18f
       secondary_startup_64+0x9f/0x9f
       ? secondary_startup_64+0x9f/0x9f
      
      Interrupts should not be enabled at this early in the boot process. It is
      also fine to leave interrupts enabled during this time as there's only one
      CPU running, and on_each_cpu() means to only run on the current CPU.
      
      If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with
      interrupts disabled. Don't trigger a BUG_ON() in that case.
      
      Link: http://lkml.kernel.org/r/20170526093717.0be3b849@gandalf.local.homeSuggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a53276e2
  9. 24 5月, 2017 4 次提交
    • A
      mm, x86/mm: Make the batched unmap TLB flush API more generic · e73ad5ff
      Andy Lutomirski 提交于
      try_to_unmap_flush() used to open-code a rather x86-centric flush
      sequence: local_flush_tlb() + flush_tlb_others().  Rearrange the
      code so that the arch (only x86 for now) provides
      arch_tlbbatch_add_mm() and arch_tlbbatch_flush() and the core code
      calls those functions instead.
      
      I'll want this for x86 because, to enable address space ids, I can't
      support the flush_tlb_others() mode used by exising
      try_to_unmap_flush() implementation with good performance.  I can
      support the new API fairly easily, though.
      
      I imagine that other architectures may be in a similar position.
      Architectures with strong remote flush primitives (arm64?) may have
      even worse performance problems with flush_tlb_others() the way that
      try_to_unmap_flush() uses it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/19f25a8581f9fb77876b7ff3b001f89835e34ea3.1495492063.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e73ad5ff
    • A
      x86/mm: Reduce indentation in flush_tlb_func() · b3b90e5a
      Andy Lutomirski 提交于
      The leave_mm() case can just exit the function early so we don't
      need to indent the entire remainder of the function.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/97901ddcc9821d7bc7b296d2918d1179f08aaf22.1495492063.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3b90e5a
    • A
      x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range() · ca6c99c0
      Andy Lutomirski 提交于
      flush_tlb_page() was very similar to flush_tlb_mm_range() except that
      it had a couple of issues:
      
       - It was missing an smp_mb() in the case where
         current->active_mm != mm.  (This is a longstanding bug reported by Nadav Amit)
      
       - It was missing tracepoints and vm counter updates.
      
      The only reason that I can see for keeping it at as a separate
      function is that it could avoid a few branches that
      flush_tlb_mm_range() needs to decide to flush just one page.  This
      hardly seems worthwhile.  If we decide we want to get rid of those
      branches again, a better way would be to introduce an
      __flush_tlb_mm_range() helper and make both flush_tlb_page() and
      flush_tlb_mm_range() use it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ca6c99c0
    • M
      x86/PAT: Fix Xorg regression on CPUs that don't support PAT · cbed27cd
      Mikulas Patocka 提交于
      In the file arch/x86/mm/pat.c, there's a '__pat_enabled' variable. The
      variable is set to 1 by default and the function pat_init() sets
      __pat_enabled to 0 if the CPU doesn't support PAT.
      
      However, on AMD K6-3 CPUs, the processor initialization code never calls
      pat_init() and so __pat_enabled stays 1 and the function pat_enabled()
      returns true, even though the K6-3 CPU doesn't support PAT.
      
      The result of this bug is that a kernel warning is produced when attempting to
      start the Xserver and the Xserver doesn't start (fork() returns ENOMEM).
      Another symptom of this bug is that the framebuffer driver doesn't set the
      K6-3 MTRR registers:
      
        x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 3891 at arch/x86/mm/pat.c:1020 untrack_pfn+0x5c/0x9f
        ...
        x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
      
      To fix the bug change pat_enabled() so that it returns true only if PAT
      initialization was actually done.
      
      Also, I changed boot_cpu_has(X86_FEATURE_PAT) to
      this_cpu_has(X86_FEATURE_PAT) in pat_ap_init(), so that we check the PAT
      feature on the processor that is being initialized.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org # v4.2+
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1704181501450.26399@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cbed27cd
  10. 09 5月, 2017 2 次提交
  11. 08 5月, 2017 1 次提交
    • X
      x86/mm: Add support for gbpages to kernel_ident_mapping_init() · 66aad4fd
      Xunlei Pang 提交于
      Kernel identity mappings on x86-64 kernels are created in two
      ways: by the early x86 boot code, or by kernel_ident_mapping_init().
      
      Native kernels (which is the dominant usecase) use the former,
      but the kexec and the hibernation code uses kernel_ident_mapping_init().
      
      There's a subtle difference between these two ways of how identity
      mappings are created, the current kernel_ident_mapping_init() code
      creates identity mappings always using 2MB page(PMD level) - while
      the native kernel boot path also utilizes gbpages where available.
      
      This difference is suboptimal both for performance and for memory
      usage: kernel_ident_mapping_init() needs to allocate pages for the
      page tables when creating the new identity mappings.
      
      This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
      to address these concerns.
      
      The primary advantage would be better TLB coverage/performance,
      because we'd utilize 1GB TLBs instead of 2MB ones.
      
      It is also useful for machines with large number of memory to
      save paging structure allocations(around 4MB/TB using 2MB page)
      when setting identity mappings for all the memory, after using
      1GB page it will consume only 8KB/TB.
      
      ( Note that this change alone does not activate gbpages in kexec,
        we are doing that in a separate patch. )
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66aad4fd
  12. 05 5月, 2017 1 次提交
    • B
      x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() · fc5f9d5f
      Baoquan He 提交于
      Jeff Moyer reported that on his system with two memory regions 0~64G and
      1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
      will make the system hang intermittently during boot. While adding 'nokaslr'
      won't.
      
      The back trace is:
      
       Oops: 0000 [#1] SMP
      
       RIP: memcpy_erms()
       [ .... ]
       Call Trace:
        pmem_rw_page()
        bdev_read_page()
        do_mpage_readpage()
        mpage_readpages()
        blkdev_readpages()
        __do_page_cache_readahead()
        force_page_cache_readahead()
        page_cache_sync_readahead()
        generic_file_read_iter()
        blkdev_read_iter()
        __vfs_read()
        vfs_read()
        SyS_read()
        entry_SYSCALL_64_fastpath()
      
      This crash happens because the for loop count calculation in sync_global_pgds()
      is not correct. When a mapping area crosses PGD entries, we should
      calculate the starting address of region which next PGD covers and assign
      it to next for loop count, but not add PGDIR_SIZE directly. The old
      code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
      otherwize the end region could be skipped so that it can't be synchronized
      to all other processes from kernel PGD init_mm.pgd.
      
      In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
      PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
      makes this area be mapped inside one PGD entry. With KASLR enabled,
      this area could cross two PGD entries, then the next PGD entry won't
      be synced to all other processes. That is why we saw empty PGD.
      
      Fix it.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc5f9d5f
  13. 26 4月, 2017 4 次提交
    • A
      x86/mm: Fix flush_tlb_page() on Xen · dbd68d8e
      Andy Lutomirski 提交于
      flush_tlb_page() passes a bogus range to flush_tlb_others() and
      expects the latter to fix it up.  native_flush_tlb_others() has the
      fixup but Xen's version doesn't.  Move the fixup to
      flush_tlb_others().
      
      AFAICS the only real effect is that, without this fix, Xen would
      flush everything instead of just the one page on remote vCPUs in
      when flush_tlb_page() was called.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e7b52ffd ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
      Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbd68d8e
    • A
      x86/mm: Make flush_tlb_mm_range() more predictable · ce27374f
      Andy Lutomirski 提交于
      I'm about to rewrite the function almost completely, but first I
      want to get a functional change out of the way.  Currently, if
      flush_tlb_mm_range() does not flush the local TLB at all, it will
      never do individual page flushes on remote CPUs.  This seems to be
      an accident, and preserving it will be awkward.  Let's change it
      first so that any regressions in the rewrite will be easier to
      bisect and so that the rewrite can attempt to change no visible
      behavior at all.
      
      The fix is simple: we can simply avoid short-circuiting the
      calculation of base_pages_to_flush.
      
      As a side effect, this also eliminates a potential corner case: if
      tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
      could have ended up flushing the entire address space one page at a
      time.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce27374f
    • A
      x86/mm: Remove flush_tlb() and flush_tlb_current_task() · 29961b59
      Andy Lutomirski 提交于
      I was trying to figure out what how flush_tlb_current_task() would
      possibly work correctly if current->mm != current->active_mm, but I
      realized I could spare myself the effort: it has no callers except
      the unused flush_tlb() macro.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29961b59
    • K
      x86/mm/64: Fix crash in remove_pagetable() · e6ab9c4d
      Kirill A. Shutemov 提交于
      remove_pagetable() does page walk using p*d_page_vaddr() plus cast.
      It's not canonical approach -- we usually use p*d_offset() for that.
      
      It works fine as long as all page table levels are present. We broke the
      invariant by introducing folded p4d page table level.
      
      As result, remove_pagetable() interprets PMD as PUD and it leads to
      crash:
      
      	BUG: unable to handle kernel paging request at ffff880300000000
      	IP: memchr_inv+0x60/0x110
      	PGD 317d067
      	P4D 317d067
      	PUD 3180067
      	PMD 33f102067
      	PTE 8000000300000060
      
      Let's fix this by using p*d_offset() instead of p*d_page_vaddr() for
      page walk.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: f2a6a705 ("x86: Convert the rest of the code to support p4d_t")
      Link: http://lkml.kernel.org/r/20170425092557.21852-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e6ab9c4d
  14. 23 4月, 2017 1 次提交
    • I
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar 提交于
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dd29b3d
  15. 13 4月, 2017 2 次提交
  16. 12 4月, 2017 1 次提交
    • J
      x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space · 5ed386ec
      Joerg Roedel 提交于
      When this function fails it just sends a SIGSEGV signal to
      user-space using force_sig(). This signal is missing
      essential information about the cause, e.g. the trap_nr or
      an error code.
      
      Fix this by propagating the error to the only caller of
      mpx_handle_bd_fault(), do_bounds(), which sends the correct
      SIGSEGV signal to the process.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: fe3d197f ('x86, mpx: On-demand kernel allocation of bounds tables')
      Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ed386ec
  17. 08 4月, 2017 1 次提交
  18. 04 4月, 2017 1 次提交
    • D
      Annotate hardware config module parameters in arch/x86/mm/ · 3c2e2e68
      David Howells 提交于
      When the kernel is running in secure boot mode, we lock down the kernel to
      prevent userspace from modifying the running kernel image.  Whilst this
      includes prohibiting access to things like /dev/mem, it must also prevent
      access by means of configuring driver modules in such a way as to cause a
      device to access or modify the kernel image.
      
      To this end, annotate module_param* statements that refer to hardware
      configuration and indicate for future reference what type of parameter they
      specify.  The parameter parser in the core sees this information and can
      skip such parameters with an error message if the kernel is locked down.
      The module initialisation then runs as normal, but just sees whatever the
      default values for those parameters is.
      
      Note that we do still need to do the module initialisation because some
      drivers have viable defaults set in case parameters aren't specified and
      some drivers support automatic configuration (e.g. PNP or PCI) in addition
      to manually coded parameters.
      
      This patch annotates drivers in arch/x86/mm/.
      
      [Note: With respect to testmmiotrace, an additional patch will be added
       separately that makes the module refuse to load if the kernel is locked
       down.]
      Suggested-by: NAlan Cox <gnomes@lxorguk.ukuu.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      cc: Ingo Molnar <mingo@kernel.org>
      cc: Thomas Gleixner <tglx@linutronix.de>
      cc: "H. Peter Anvin" <hpa@zytor.com>
      cc: x86@kernel.org
      cc: linux-kernel@vger.kernel.org
      cc: nouveau@lists.freedesktop.org
      3c2e2e68