1. 05 3月, 2016 1 次提交
    • K
      x86/mm/kmmio: Fix mmiotrace for hugepages · cfa52c0c
      Karol Herbst 提交于
      Because Linux might use bigger pages than the 4K pages to handle those mmio
      ioremaps, the kmmio code shouldn't rely on the pade id as it currently does.
      
      Using the memory address instead of the page id lets us look up how big the
      page is and what its base address is, so that we won't get a page fault
      within the same page twice anymore.
      Tested-by: NPierre Moreau <pierre.morrow@free.fr>
      Signed-off-by: NKarol Herbst <nouveau@karolherbst.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-x86_64@vger.kernel.org
      Cc: nouveau@lists.freedesktop.org
      Cc: pq@iki.fi
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1456966991-6861-1-git-send-email-nouveau@karolherbst.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cfa52c0c
  2. 25 2月, 2016 1 次提交
    • J
      x86/mm: Avoid premature success when changing page attributes · 405e1133
      Jan Beulich 提交于
      set_memory_nx() (and set_memory_x()) currently differ in behavior from
      all other set_memory_*() functions when encountering a virtual address
      space hole within the kernel address range: They stop processing at the
      hole, but nevertheless report success (making the caller believe the
      operation was carried out on the entire range). While observed to be a
      problem - triggering the CONFIG_DEBUG_WX warning - only with out of
      tree code, I suspect (but didn't check) that on x86-64 the
      CONFIG_DEBUG_PAGEALLOC logic in free_init_pages() would, when called
      from free_initmem(), have the same effect on the set_memory_nx() called
      from mark_rodata_ro().
      
      This unexpected behavior is a result of change_page_attr_set_clr()
      special casing changes to only the NX bit, in that it passes "false" as
      the "checkalias" argument to __change_page_attr_set_clr(). Since this
      flag becomes the "primary" argument of both __change_page_attr() and
      __cpa_process_fault(), the latter would so far return success without
      adjusting cpa->numpages. Success to the higher level callers, however,
      means that whatever cpa->numpages currently holds is the count of
      successfully processed pages. The cases when __change_page_attr() calls
      __cpa_process_fault(), otoh, don't generally mean the entire range got
      processed (as can be seen from one of the two success return paths in
      __cpa_process_fault() already adjusting ->numpages).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/56BB0AD402000078000D05BF@prv-mh.provo.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      405e1133
  3. 20 2月, 2016 1 次提交
  4. 09 2月, 2016 2 次提交
    • A
      x86/kasan: Write protect kasan zero shadow · 063fb3e5
      Andrey Ryabinin 提交于
      After kasan_init() executed, no one is allowed to write to kasan_zero_page,
      so write protect it.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      063fb3e5
    • A
      x86/kasan: Clear kasan_zero_page after TLB flush · 69e0210f
      Andrey Ryabinin 提交于
      Currently we clear kasan_zero_page before __flush_tlb_all(). This
      works with current implementation of native_flush_tlb[_global]()
      because it doesn't cause do any writes to kasan shadow memory.
      But any subtle change made in native_flush_tlb*() could break this.
      Also current code seems doesn't work for paravirt guests (lguest).
      
      Only after the TLB flush we can be sure that kasan_zero_page is not
      used as early shadow anymore (instrumented code will not write to it).
      So it should cleared it only after the TLB flush.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69e0210f
  5. 08 2月, 2016 3 次提交
    • I
      x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug() · 5f7ee246
      Ingo Molnar 提交于
      numa_clear_kernel_node_hotplug() uses memblock_set_node() without
      checking for failures.
      
      memblock_set_node() is a complex function that might extend the
      memblock array - which extension might fail - so check for this
      possibility.
      
      It's not supposed to happen (because realistically if we have so
      little memory that this fails then we likely won't be able to
      boot anyway), but do the check nevertheless.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: y14sg1 <y14sg1@comcast.net>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5f7ee246
    • I
      x86/mm/numa: Clean up numa_clear_kernel_node_hotplug() · c1a0bf34
      Ingo Molnar 提交于
      So we fixed an overflow bug in numa_clear_kernel_node_hotplug():
      
        2b54ab3c66d4 ("x86/mm/numa: Fix memory corruption on 32-bit NUMA kernels")
      
      ... and the bug was indirectly caused by poor coding style,
      such as using start/end local variables unnecessarily, which
      lost the physaddr_t type.
      
      So make the code more readable and try to fully comment all
      the thinking behind the logic.
      
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: y14sg1 <y14sg1@comcast.net>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c1a0bf34
    • I
      x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels · 59fd1214
      Ingo Molnar 提交于
      The following commit:
      
        a0acda91 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable")
      
      Introduced numa_clear_kernel_node_hotplug(), which function is executed
      during early bootup, and which marks all currently reserved memblock
      regions as hot-memory-unswappable as well.
      
      y14sg1 <y14sg1@comcast.net> reported that when running 32-bit NUMA kernels,
      the grsecurity/PAX kernel patch flagged a size overflow in this function:
      
        PAX: size overflow detected in function x86_numa_init arch/x86/mm/numa.c:691 [...]
      
      ... the reason for the overflow is that memblock_clear_hotplug() takes physical
      addresses as arguments, while the start/end variables used by
      numa_clear_kernel_node_hotplug() are 'unsigned long', which is 32-bit on PAE
      kernels, but which has 64-bit physical addresses.
      
      So on 32-bit PAE kernels that have physical memory above the 4GB boundary,
      we truncate a 64-bit physical address range to 32 bits and pass it to
      memblock_clear_hotplug(), which at minimum prevents the original memory-hotplug
      bugfix from working, but might have other side effects as well.
      
      The fix is to use the proper type to handle physical addresses, phys_addr_t.
      Reported-by: Ny14sg1 <y14sg1@comcast.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59fd1214
  6. 29 1月, 2016 1 次提交
    • M
      x86/mm/pat: Avoid truncation when converting cpa->numpages to address · 74256377
      Matt Fleming 提交于
      There are a couple of nasty truncation bugs lurking in the pageattr
      code that can be triggered when mapping EFI regions, e.g. when we pass
      a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
      left by PAGE_SHIFT will truncate the resultant address to 32-bits.
      
      Viorel-Cătălin managed to trigger this bug on his Dell machine that
      provides a ~5GB EFI region which requires 1236992 pages to be mapped.
      When calling populate_pud() the end of the region gets calculated
      incorrectly in the following buggy expression,
      
        end = start + (cpa->numpages << PAGE_SHIFT);
      
      And only 188416 pages are mapped. Next, populate_pud() gets invoked
      for a second time because of the loop in __change_page_attr_set_clr(),
      only this time no pages get mapped because shifting the remaining
      number of pages (1048576) by PAGE_SHIFT is zero. At which point the
      loop in __change_page_attr_set_clr() spins forever because we fail to
      map progress.
      
      Hitting this bug depends very much on the virtual address we pick to
      map the large region at and how many pages we map on the initial run
      through the loop. This explains why this issue was only recently hit
      with the introduction of commit
      
        a5caa209 ("x86/efi: Fix boot crash by mapping EFI memmap
         entries bottom-up at runtime, instead of top-down")
      
      It's interesting to note that safe uses of cpa->numpages do exist in
      the pageattr code. If instead of shifting ->numpages we multiply by
      PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
      so the result is unsigned long.
      
      To avoid surprises when users try to convert very large cpa->numpages
      values to addresses, change the data type from 'int' to 'unsigned
      long', thereby making it suitable for shifting by PAGE_SHIFT without
      any type casting.
      
      The alternative would be to make liberal use of casting, but that is
      far more likely to cause problems in the future when someone adds more
      code and fails to cast properly; this bug was difficult enough to
      track down in the first place.
      Reported-and-tested-by: NViorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
      Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      74256377
  7. 20 1月, 2016 2 次提交
  8. 19 1月, 2016 1 次提交
    • S
      x86/mm: Streamline and restore probe_memory_block_size() · 43c75f93
      Seth Jennings 提交于
      The cumulative effect of the following two commits:
      
        bdee237c ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems")
        982792c7 ("x86, mm: probe memory block size for generic x86 64bit")
      
      ... is some pretty convoluted code.
      
      The first commit also removed code for the UV case without stated reason,
      which might lead to unexpected change in behavior.
      
      This commit has no other (intended) functional change; just seeks to simplify
      and make the code more understandable, beyond restoring the UV behavior.
      
      The whole section with the "tail size" doesn't seem to be
      reachable, since both the >= 64GB and < 64GB case return, so it
      was removed.
      Signed-off-by: NSeth Jennings <sjennings@variantweb.net>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1448902063-18885-1-git-send-email-sjennings@variantweb.net
      [ Rewrote the title and changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      43c75f93
  9. 16 1月, 2016 5 次提交
  10. 15 1月, 2016 1 次提交
    • D
      x86: mm: support ARCH_MMAP_RND_BITS · 9e08f57d
      Daniel Cashman 提交于
      x86: arch_mmap_rnd() uses hard-coded values, 8 for 32-bit and 28 for
      64-bit, to generate the random offset for the mmap base address.  This
      value represents a compromise between increased ASLR effectiveness and
      avoiding address-space fragmentation.  Replace it with a Kconfig option,
      which is sensibly bounded, so that platform developers may choose where
      to place this compromise.  Keep default values as new minimums.
      Signed-off-by: NDaniel Cashman <dcashman@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e08f57d
  11. 12 1月, 2016 2 次提交
  12. 11 1月, 2016 1 次提交
    • A
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · 71b3c126
      Andy Lutomirski 提交于
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      71b3c126
  13. 09 1月, 2016 1 次提交
  14. 05 1月, 2016 2 次提交
    • T
      x86/mm/pat: Change free_memtype() to support shrinking case · 2039e6ac
      Toshi Kani 提交于
      Using mremap() to shrink the map size of a VM_PFNMAP range causes
      the following error message, and leaves the pfn range allocated.
      
       x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff]
      
      This is because rbt_memtype_erase(), called from free_memtype()
      with spin_lock held, only supports to free a whole memtype node in
      memtype_rbroot.  Therefore, this patch changes rbt_memtype_erase()
      to support a request that shrinks the size of a memtype node for
      mremap().
      
      memtype_rb_exact_match() is renamed to memtype_rb_match(), and
      is enhanced to support EXACT_MATCH and END_MATCH in @match_type.
      Since the memtype_rbroot tree allows overlapping ranges,
      rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free
      a whole node for the munmap case.  If no such entry is found,
      it then checks with END_MATCH, i.e. shrink the size of a node
      from the end for the mremap case.
      
      On the mremap case, rbt_memtype_erase() proceeds in two steps,
      1) remove the node, and then 2) insert the updated node.  This
      allows proper update of augmented values, subtree_max_end, in
      the tree.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: stsp@list.ru
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2039e6ac
    • T
      x86/mm/pat: Add untrack_pfn_moved for mremap · d9fe4fab
      Toshi Kani 提交于
      mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following
      WARN_ON_ONCE() message in untrack_pfn().
      
        WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0()
        Call Trace:
        [<ffffffff817729ea>] dump_stack+0x45/0x57
        [<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0
        [<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0
        [<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860
        [<ffffffff811d3725>] unmap_vmas+0x55/0xb0
        [<ffffffff811d916c>] unmap_region+0xac/0x120
        [<ffffffff811db86a>] do_munmap+0x28a/0x460
        [<ffffffff811dec33>] move_vma+0x1b3/0x2e0
        [<ffffffff811df113>] SyS_mremap+0x3b3/0x510
        [<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      MREMAP_FIXED moves a pfnmap from old vma to new vma.  untrack_pfn() is
      called with the old vma after its pfnmap page table has been removed,
      which causes follow_phys() to fail.  The new vma has a new pfnmap to
      the same pfn & cache type with VM_PAT set.  Therefore, we only need to
      clear VM_PAT from the old vma in this case.
      
      Add untrack_pfn_moved(), which clears VM_PAT from a given old vma.
      move_vma() is changed to call this function with the old vma when
      VM_PFNMAP is set.  move_vma() then calls do_munmap(), and untrack_pfn()
      is a no-op since VM_PAT is cleared.
      Reported-by: NStas Sergeev <stsp@list.ru>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d9fe4fab
  15. 29 12月, 2015 1 次提交
  16. 19 12月, 2015 1 次提交
  17. 16 12月, 2015 1 次提交
    • L
      Fix user-visible spelling error · 173ae9ba
      Linus Torvalds 提交于
      Pavel Machek reports a warning about W+X pages found in the "Persisent"
      kmap area.  After grepping for it (using the correct spelling), and not
      finding it, I noticed how the debug printk was just misspelled.  Fix it.
      
      The actual mapping bug that Pavel reported is still open.  It's
      apparently a separate issue from the known EFI page tables, looks like
      it's related to the HIGHMEM mappings.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      173ae9ba
  18. 06 12月, 2015 2 次提交
    • I
      x86/mm: Introduce max_possible_pfn · 8dd33030
      Igor Mammedov 提交于
      max_possible_pfn will be used for tracking max possible
      PFN for memory that isn't present in E820 table and
      could be hotplugged later.
      
      By default max_possible_pfn is initialized with max_pfn,
      but later it could be updated with highest PFN of
      hotpluggable memory ranges declared in ACPI SRAT table
      if any present.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akataria@vmware.com
      Cc: fujita.tomonori@lab.ntt.co.jp
      Cc: konrad.wilk@oracle.com
      Cc: pbonzini@redhat.com
      Cc: revers@redhat.com
      Cc: riel@redhat.com
      Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8dd33030
    • D
      x86/mpx: Fix instruction decoder condition · 8e8efe03
      Dave Hansen 提交于
      MPX decodes instructions in order to tell which bounds register
      was violated.  Part of this decoding involves looking at the "REX
      prefix" which is a special instrucion prefix used to retrofit
      support for new registers in to old instructions.
      
      The X86_REX_*() macros are defined to return actual bit values:
      
      	#define X86_REX_R(rex) ((rex) & 4)
      
      *not* boolean values.  However, the MPX code was checking for
      them like they were booleans.  This might have led to us
      mis-decoding the "REX prefix" and giving false information out to
      userspace about bounds violations.  X86_REX_B() actually is bit 1,
      so this is really only broken for the X86_REX_X() case.
      
      Fix the conditionals up to tolerate the non-boolean values.
      
      Fixes: fcc7ffd6 "x86, mpx: Decode MPX instruction to get bound violation information"
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: x86@kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20151201003113.D800C1E0@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8e8efe03
  19. 04 12月, 2015 1 次提交
    • B
      x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only · 071ac0c4
      Borislav Petkov 提交于
      File should be created with S_IRUSR and not with S_IWUSR too
      because writing to it doesn't make any sense. I mean, we don't
      have a ->write method anyway but let's have the permissions
      correct too.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1448885579-32506-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      071ac0c4
  20. 26 11月, 2015 1 次提交
  21. 23 11月, 2015 1 次提交
    • K
      x86/mm: Turn CONFIG_X86_PTDUMP into a module · 8609d1b5
      Kees Cook 提交于
      Being able to examine page tables is handy, so make this a
      module that can be loaded as needed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8609d1b5
  22. 12 11月, 2015 2 次提交
    • D
      x86/mpx: Fix 32-bit address space calculation · f3119b83
      Dave Hansen 提交于
      I received a bug report that running 32-bit MPX binaries on
      64-bit kernels was broken.  I traced it down to this little code
      snippet.  We were switching our "number of bounds directory
      entries" calculation correctly.  But, we didn't switch the other
      side of the calculation: the virtual space size.
      
      This meant that we were calculating an absurd size for
      bd_entry_virt_space() on 32-bit because we used the 64-bit
      virt_space.
      
      This was _also_ broken for 32-bit kernels running on 64-bit
      hardware since boot_cpu_data.x86_virt_bits=48 even when running
      in 32-bit mode.
      
      Correct that and properly handle all 3 possible cases:
      
       1. 32-bit binary on 64-bit kernel
       2. 64-bit binary on 64-bit kernel
       3. 32-bit binary on 32-bit kernel
      
      This manifested in having bounds tables not properly unmapped.
      It "leaked" memory but had no functional impact otherwise.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3119b83
    • D
      x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels · 46561c39
      Dave Hansen 提交于
      When you call get_user(foo, bar), you effectively do a
      
      	copy_from_user(&foo, bar, sizeof(*bar));
      
      Note that the sizeof() is implicit.
      
      When we reach out to userspace to try to zap an entire "bounds
      table" we need to go read a "bounds directory entry" in order to
      locate the table's address.  The size of a "directory entry"
      depends on the binary being run and is always the size of a
      pointer.
      
      But, when we have a 64-bit kernel and a 32-bit application, the
      directory entry is still only 32-bits long, but we fetch it with
      a 64-bit pointer which makes get_user() does a 64-bit fetch.
      Reading 4 extra bytes isn't harmful, unless we are at the end of
      and run off the table.  It might also cause the zero page to get
      faulted in unnecessarily even if you are not at the end.
      
      Fix it up by doing a special 32-bit get_user() via a cast when
      we have 32-bit userspace.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      46561c39
  23. 10 11月, 2015 1 次提交
  24. 07 11月, 2015 1 次提交
  25. 06 11月, 2015 1 次提交
  26. 25 10月, 2015 1 次提交
  27. 21 10月, 2015 1 次提交
    • B
      x86/microcode: Merge the early microcode loader · fe055896
      Borislav Petkov 提交于
      Merge the early loader functionality into the driver proper. The
      diff is huge but logically, it is simply moving code from the
      _early.c files into the main driver.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fe055896
  28. 10 10月, 2015 1 次提交