1. 27 3月, 2017 1 次提交
  2. 18 3月, 2017 2 次提交
  3. 16 3月, 2017 1 次提交
    • T
      x86/mm: Adapt MODULES_END based on fixmap section size · f06bdd40
      Thomas Garnier 提交于
      This patch aligns MODULES_END to the beginning of the fixmap section.
      It optimizes the space available for both sections. The address is
      pre-computed based on the number of pages required by the fixmap
      section.
      
      It will allow GDT remapping in the fixmap section. The current
      MODULES_END static address does not provide enough space for the kernel
      to support a large number of processors.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com
      [ Small build fix. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f06bdd40
  4. 14 3月, 2017 5 次提交
  5. 13 3月, 2017 3 次提交
    • D
      x86/mm: Introduce mmap_compat_base() for 32-bit mmap() · 1b028f78
      Dmitry Safonov 提交于
      mmap() uses a base address, from which it starts to look for a free space
      for allocation.
      
      The base address is stored in mm->mmap_base, which is calculated during
      exec(). The address depends on task's size, set rlimit for stack, ASLR
      randomization. The base depends on the task size and the number of random
      bits which are different for 64-bit and 32bit applications.
      
      Due to the fact, that the base address is fixed, its mmap() from a compat
      (32bit) syscall issued by a 64bit task will return a address which is based
      on the 64bit base address and does not fit into the 32bit address space
      (4GB). The returned pointer is truncated to 32bit, which results in an
      invalid address.
      
      To solve store a seperate compat address base plus a compat legacy address
      base in mm_struct. These bases are calculated at exec() time and can be
      used later to address the 32bit compat mmap() issued by 64 bit
      applications.
      
      As a consequence of this change 32-bit applications issuing a 64-bit
      syscall (after doing a long jump) will get a 64-bit mapping now. Before
      this change 32-bit applications always got a 32bit mapping.
      
      [ tglx: Massaged changelog and added a comment ]
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-4-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1b028f78
    • D
      x86/mm: Add task_size parameter to mmap_base() · 8f3e474f
      Dmitry Safonov 提交于
      To correctly handle 32-bit and 64-bit mmap() syscalls in 64bit applications
      its required to have separate address bases to place a mapping.
      
      The tasksize can be used as an indicator to select the proper parameters
      for mmap_base().
      
      This requires the following changes:
      
       - Add task_size argument to mmap_base() and make the calculation based on it.
       - Provide mmap_legacy_base() as a seperate function
       - Use the new functions in arch_pick_mmap_layout()
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-3-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8f3e474f
    • D
      x86/mm: Introduce arch_rnd() to compute 32/64 mmap random base · 6a0b41d1
      Dmitry Safonov 提交于
      The compat (32bit) mmap() sycall issued by a 64-bit task results in a
      mapping above 4GB. That's outside the compat mode address space and
      prevents CRIU to restore 32bit processes from a 64bit application.
      
      As a first step to address this, split out the address base randomizing
      calculation from arch_mmap_rnd() into a helper function, which can be used
      independent of mmap_ia32() based decisions.
      
      [ tglx: Massaged changelog ]
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-2-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a0b41d1
  6. 10 3月, 2017 2 次提交
  7. 02 3月, 2017 6 次提交
  8. 28 2月, 2017 1 次提交
  9. 25 2月, 2017 3 次提交
  10. 23 2月, 2017 2 次提交
  11. 17 2月, 2017 2 次提交
    • A
      x86/mm/ptdump: Add address marker for KASAN shadow region · 025205f8
      Andrey Ryabinin 提交于
      Annotate the KASAN shadow with address markers in page table
      dump output:
      
      $ cat /sys/kernel/debug/kernel_page_tables
      ...
      
      ---[ Vmemmap ]---
      0xffffea0000000000-0xffffea0003000000          48M     RW         PSE     GLB NX pmd
      0xffffea0003000000-0xffffea0004000000          16M                               pmd
      0xffffea0004000000-0xffffea0005000000          16M     RW         PSE     GLB NX pmd
      0xffffea0005000000-0xffffea0040000000         944M                               pmd
      0xffffea0040000000-0xffffea8000000000         511G                               pud
      0xffffea8000000000-0xffffec0000000000        1536G                               pgd
      ---[ KASAN shadow ]---
      0xffffec0000000000-0xffffed0000000000           1T     ro                 GLB NX pte
      0xffffed0000000000-0xffffed0018000000         384M     RW         PSE     GLB NX pmd
      0xffffed0018000000-0xffffed0020000000         128M                               pmd
      0xffffed0020000000-0xffffed0028200000         130M     RW         PSE     GLB NX pmd
      0xffffed0028200000-0xffffed0040000000         382M                               pmd
      0xffffed0040000000-0xffffed8000000000         511G                               pud
      0xffffed8000000000-0xfffff50000000000        7680G                               pgd
      0xfffff50000000000-0xfffffbfff0000000     7339776M     ro                 GLB NX pte
      0xfffffbfff0000000-0xfffffbfff0200000           2M                               pmd
      0xfffffbfff0200000-0xfffffbfff0a00000           8M     RW         PSE     GLB NX pmd
      0xfffffbfff0a00000-0xfffffbffffe00000         244M                               pmd
      0xfffffbffffe00000-0xfffffc0000000000           2M     ro                 GLB NX pte
      ---[ KASAN shadow end ]---
      0xfffffc0000000000-0xffffff0000000000           3T                               pgd
      ---[ ESPfix Area ]---
      ...
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NAlexander Potapenko <glider@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Link: http://lkml.kernel.org/r/20170214100839.17186-2-aryabinin@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      025205f8
    • A
      x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=y · 243b72aa
      Andrey Ryabinin 提交于
      Enabling both DEBUG_WX=y and KASAN=y options significantly increases
      boot time (dozens of seconds at least).
      KASAN fills kernel page tables with repeated values to map several
      TBs of the virtual memory to the single kasan_zero_page:
      
          kasan_zero_pud ->
              kasan_zero_pmd->
                  kasan_zero_pte->
                      kasan_zero_page
      
      So, the page table walker used to find W+X mapping check the same
      kasan_zero_p?d page table entries a lot more than once.
      With patch pud walker will skip the pud if it has the same value as
      the previous one . Skipping done iff we search for W+X mappings,
      so this optimization won't affect the page table dump via debugfs.
      
      This dropped time spend in W+X check from ~30 sec to reasonable 0.1 sec:
      
      Before:
      [    4.579991] Freeing unused kernel memory: 1000K
      [   35.257523] x86/mm: Checked W+X mappings: passed, no W+X pages found.
      
      After:
      [    5.138756] Freeing unused kernel memory: 1000K
      [    5.266496] x86/mm: Checked W+X mappings: passed, no W+X pages found.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NAlexander Potapenko <glider@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Link: http://lkml.kernel.org/r/20170214100839.17186-1-aryabinin@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      243b72aa
  12. 10 2月, 2017 1 次提交
  13. 05 2月, 2017 1 次提交
  14. 30 1月, 2017 1 次提交
    • J
      x86/mm/cpa: Avoid wbinvd() for PREEMPT · 459fbe00
      John Ogness 提交于
      Although wbinvd() is faster than flushing many individual pages, it blocks
      the memory bus for "long" periods of time (>100us), thus directly causing
      unusually large latencies on all CPUs, regardless of any CPU isolation
      features that may be active. This is an unpriviledged operatation as it is
      exposed to user space via the graphics subsystem.
      
      For 1024 pages, flushing those pages individually can take up to 2200us,
      but the task remains fully preemptible during that time.
      Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      459fbe00
  15. 14 1月, 2017 1 次提交
  16. 25 12月, 2016 1 次提交
  17. 17 12月, 2016 1 次提交
  18. 15 12月, 2016 2 次提交
  19. 21 11月, 2016 1 次提交
    • A
      x86/traps: Ignore high word of regs->cs in early_fixup_exception() · fc0e81b2
      Andy Lutomirski 提交于
      On the 80486 DX, it seems that some exceptions may leave garbage in
      the high bits of CS.  This causes sporadic failures in which
      early_fixup_exception() refuses to fix up an exception.
      
      As far as I can tell, this has been buggy for a long time, but the
      problem seems to have been exacerbated by commits:
      
        1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
        e1bfc11c ("x86/init: Fix cr4_init_shadow() on CR4-less machines")
      
      This appears to have broken for as long as we've had early
      exception handling.
      
      [ Note to stable maintainers: This patch is needed all the way back to 3.4,
        but it will only apply to 4.6 and up, as it depends on commit:
      
          0e861fbb ("x86/head: Move early exception panic code into early_fixup_exception()")
      
        If you want to backport to kernels before 4.6, please don't backport the
        prerequisites (there was a big chain of them that rewrote a lot of the
        early exception machinery); instead, ask me and I can send you a one-liner
        that will apply. ]
      Reported-by: NMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 4c5023a3 ("x86-32: Handle exception table entries during early boot")
      Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc0e81b2
  20. 10 11月, 2016 1 次提交
  21. 28 10月, 2016 1 次提交
  22. 26 10月, 2016 1 次提交
    • D
      x86/io: add interface to reserve io memtype for a resource range. (v1.1) · 8ef42276
      Dave Airlie 提交于
      A recent change to the mm code in:
      87744ab3 mm: fix cache mode tracking in vm_insert_mixed()
      
      started enforcing checking the memory type against the registered list for
      amixed pfn insertion mappings. It happens that the drm drivers for a number
      of gpus relied on this being broken. Currently the driver only inserted
      VRAM mappings into the tracking table when they came from the kernel,
      and userspace mappings never landed in the table. This led to a regression
      where all the mapping end up as UC instead of WC now.
      
      I've considered a number of solutions but since this needs to be fixed
      in fixes and not next, and some of the solutions were going to introduce
      overhead that hadn't been there before I didn't consider them viable at
      this stage. These mainly concerned hooking into the TTM io reserve APIs,
      but these API have a bunch of fast paths I didn't want to unwind to add
      this to.
      
      The solution I've decided on is to add a new API like the arch_phys_wc
      APIs (these would have worked but wc_del didn't take a range), and
      use them from the drivers to add a WC compatible mapping to the table
      for all VRAM on those GPUs. This means we can then create userspace
      mapping that won't get degraded to UC.
      
      v1.1: use CONFIG_X86_PAT + add some comments in io.h
      
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: x86@kernel.org
      Cc: mcgrof@suse.com
      Cc: Dan Williams <dan.j.williams@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      8ef42276