1. 15 5月, 2019 6 次提交
  2. 27 11月, 2018 1 次提交
  3. 10 9月, 2018 1 次提交
  4. 20 7月, 2018 1 次提交
    • P
      x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e
      Pavel Tatashin 提交于
      Currently, the notsc kernel parameter disables the use of the TSC by
      sched_clock(). However, this parameter does not prevent the kernel from
      accessing tsc in other places.
      
      The only rationale to boot with notsc is to avoid timing discrepancies on
      multi-socket systems where TSC are not properly synchronized, and thus
      exclude TSC from being used for time keeping. But that prevents using TSC
      as sched_clock() as well, which is not necessary as the core sched_clock()
      implementation can handle non synchronized TSC based sched clocks just
      fine.
      
      However, there is another method to solve the above problem: booting with
      tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
      just excludes it from timekeeping.
      
      So there is no real reason to keep notsc, but for compatibility reasons the
      parameter has to stay. Make it behave like 'tsc=unstable' instead.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
      fe9af81e
  5. 07 7月, 2018 1 次提交
    • D
      x86/numa_emulation: Introduce uniform split capability · cc9aec03
      Dan Williams 提交于
      The current NUMA emulation capabilities for splitting System RAM by a
      fixed size or by a set number of nodes may result in some nodes being
      larger than others. The implementation prioritizes establishing a
      minimum usable memory size over satisfying the requested number of NUMA
      nodes.
      
      Introduce a uniform split capability that evenly partitions each
      physical NUMA node into N emulated nodes. For example numa=fake=3U
      creates 6 emulated nodes total on a system that has 2 physical nodes.
      
      This capability is useful for debugging and evaluating platform
      memory-side-cache capabilities as described by the ACPI HMAT (see
      5.2.27.5 Memory Side Cache Information Structure in ACPI 6.2a)
      
      Compare numa=fake=6 that results in only 5 nodes being created against
      numa=fake=3U which takes the 2 physical nodes and evenly divides them.
      
      numa=fake=6
      available: 5 nodes (0-4)
      node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
      node 0 size: 2648 MB
      node 0 free: 2443 MB
      node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
      node 1 size: 2672 MB
      node 1 free: 2442 MB
      node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
      node 2 size: 5291 MB
      node 2 free: 5278 MB
      node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
      node 3 size: 2677 MB
      node 3 free: 2665 MB
      node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
      node 4 size: 2676 MB
      node 4 free: 2663 MB
      node distances:
      node   0   1   2   3   4
        0:  10  20  10  20  20
        1:  20  10  20  10  10
        2:  10  20  10  20  20
        3:  20  10  20  10  10
        4:  20  10  20  10  10
      
      numa=fake=3U
      available: 6 nodes (0-5)
      node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
      node 0 size: 2900 MB
      node 0 free: 2637 MB
      node 1 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
      node 1 size: 3023 MB
      node 1 free: 3012 MB
      node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
      node 2 size: 2015 MB
      node 2 free: 2004 MB
      node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
      node 3 size: 2704 MB
      node 3 free: 2522 MB
      node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
      node 4 size: 2709 MB
      node 4 free: 2698 MB
      node 5 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
      node 5 size: 2612 MB
      node 5 free: 2601 MB
      node distances:
      node   0   1   2   3   4   5
        0:  10  10  10  20  20  20
        1:  10  10  10  20  20  20
        2:  10  10  10  20  20  20
        3:  20  20  20  10  10  10
        4:  20  20  20  10  10  10
        5:  20  20  20  10  10  10
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/153089328617.27680.14930758266174305832.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cc9aec03
  6. 03 7月, 2018 1 次提交
    • R
      x86/intel_rdt: Make CPU information accessible for pseudo-locked regions · 33dc3e41
      Reinette Chatre 提交于
      When a resource group enters pseudo-locksetup mode it reflects that the
      platform supports cache pseudo-locking and the resource group is unused,
      ready to be used for a pseudo-locked region. Until it is set up as a
      pseudo-locked region the resource group is "locked down" such that no new
      tasks or cpus can be assigned to it. This is accomplished in a user visible
      way by making the cpus, cpus_list, and tasks resctrl files inaccassible
      (user cannot read from or write to these files).
      
      When the resource group changes to pseudo-locked mode it represents a cache
      pseudo-locked region. While not appropriate to make any changes to the cpus
      assigned to this region it is useful to make it easy for the user to see
      which cpus are associated with the pseudo-locked region.
      
      Modify the permissions of the cpus/cpus_list file when the resource group
      changes to pseudo-locked mode to support reading (not writing).  The
      information presented to the user when reading the file are the cpus
      associated with the pseudo-locked region.
      Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: fenghua.yu@intel.com
      Cc: tony.luck@intel.com
      Cc: vikas.shivappa@linux.intel.com
      Cc: gavin.hindman@intel.com
      Cc: jithu.joseph@intel.com
      Cc: dave.hansen@intel.com
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/12756b7963b6abc1bffe8fb560b87b75da827bd1.1530421961.git.reinette.chatre@intel.com
      33dc3e41
  7. 24 6月, 2018 1 次提交
  8. 23 6月, 2018 2 次提交
  9. 28 5月, 2018 3 次提交
  10. 19 5月, 2018 1 次提交
  11. 03 4月, 2018 1 次提交
  12. 27 3月, 2018 1 次提交
  13. 01 3月, 2018 1 次提交
  14. 23 2月, 2018 1 次提交
  15. 16 2月, 2018 1 次提交
  16. 25 1月, 2018 1 次提交
  17. 19 1月, 2018 1 次提交
  18. 18 1月, 2018 2 次提交
  19. 07 1月, 2018 1 次提交
  20. 05 1月, 2018 3 次提交
    • T
      x86/kaslr: Fix the vaddr_end mess · 1dddd251
      Thomas Gleixner 提交于
      vaddr_end for KASLR is only documented in the KASLR code itself and is
      adjusted depending on config options. So it's not surprising that a change
      of the memory layout causes KASLR to have the wrong vaddr_end. This can map
      arbitrary stuff into other areas causing hard to understand problems.
      
      Remove the whole ifdef magic and define the start of the cpu_entry_area to
      be the end of the KASLR vaddr range.
      
      Add documentation to that effect.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>,
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
      1dddd251
    • T
      x86/mm: Map cpu_entry_area at the same place on 4/5 level · f2078904
      Thomas Gleixner 提交于
      There is no reason for 4 and 5 level pagetables to have a different
      layout. It just makes determining vaddr_end for KASLR harder than
      necessary.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>,
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
      f2078904
    • A
      x86/mm: Set MODULES_END to 0xffffffffff000000 · f5a40711
      Andrey Ryabinin 提交于
      Since f06bdd40 ("x86/mm: Adapt MODULES_END based on fixmap section size")
      kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.
      
      So passing page unaligned address to kasan_populate_zero_shadow() have two
      possible effects:
      
      1) It may leave one page hole in supposed to be populated area. After commit
        21506525 ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
        hole happens to be in the shadow covering fixmap area and leads to crash:
      
       BUG: unable to handle kernel paging request at fffffbffffe8ee04
       RIP: 0010:check_memory_region+0x5c/0x190
      
       Call Trace:
        <NMI>
        memcpy+0x1f/0x50
        ghes_copy_tofrom_phys+0xab/0x180
        ghes_read_estatus+0xfb/0x280
        ghes_notify_nmi+0x2b2/0x410
        nmi_handle+0x115/0x2c0
        default_do_nmi+0x57/0x110
        do_nmi+0xf8/0x150
        end_repeat_nmi+0x1a/0x1e
      
      Note, the crash likely disappeared after commit 92a0f81d, which
      changed kasan_populate_zero_shadow() call the way it was before
      commit 21506525.
      
      2) Attempt to load module near MODULES_END will fail, because
         __vmalloc_node_range() called from kasan_module_alloc() will hit the
         WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.
      
      To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
      which means that MODULES_END should be 8*PAGE_SIZE aligned.
      
      The whole point of commit f06bdd40 was to move MODULES_END down if
      NR_CPUS is big, so the cpu_entry_area takes a lot of space.
      But since 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      the cpu_entry_area is no longer in fixmap, so we could just set
      MODULES_END to a fixed 8*PAGE_SIZE aligned address.
      
      Fixes: f06bdd40 ("x86/mm: Adapt MODULES_END based on fixmap section size")
      Reported-by: NJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
      f5a40711
  21. 24 12月, 2017 2 次提交
    • A
      x86/pti: Put the LDT in its own PGD if PTI is on · f55f0501
      Andy Lutomirski 提交于
      With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
      The LDT is per process, i.e. per mm.
      
      An earlier approach mapped the LDT on context switch into a fixmap area,
      but that's a big overhead and exhausted the fixmap space when NR_CPUS got
      big.
      
      Take advantage of the fact that there is an address space hole which
      provides a completely unused pgd. Use this pgd to manage per-mm LDT
      mappings.
      
      This has a down side: the LDT isn't (currently) randomized, and an attack
      that can write the LDT is instant root due to call gates (thanks, AMD, for
      leaving call gates in AMD64 but designing them wrong so they're only useful
      for exploits).  This can be mitigated by making the LDT read-only or
      randomizing the mapping, either of which is strightforward on top of this
      patch.
      
      This will significantly slow down LDT users, but that shouldn't matter for
      important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
      old libc implementations.
      
      [ tglx: Cleaned it up. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f55f0501
    • A
      x86/mm/64: Make a full PGD-entry size hole in the memory map · 9f449772
      Andy Lutomirski 提交于
      Shrink vmalloc space from 16384TiB to 12800TiB to enlarge the hole starting
      at 0xff90000000000000 to be a full PGD entry.
      
      A subsequent patch will use this hole for the pagetable isolation LDT
      alias.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9f449772
  22. 23 12月, 2017 3 次提交
    • T
      x86/cpu_entry_area: Move it out of the fixmap · 92a0f81d
      Thomas Gleixner 提交于
      Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
      and 0-day already hit a case where the fixmap PTEs were cleared by
      cleanup_highmap().
      
      Aside of that the fixmap API is a pain as it's all backwards.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      92a0f81d
    • P
      x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation · e8ffe96e
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e8ffe96e
    • A
      x86/mm/64: Improve the memory map documentation · 5a7ccf47
      Andy Lutomirski 提交于
      The old docs had the vsyscall range wrong and were missing the fixmap.
      Fix both.
      
      There used to be 8 MB reserved for future vsyscalls, but that's long gone.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5a7ccf47
  23. 21 11月, 2017 1 次提交
  24. 07 11月, 2017 1 次提交
  25. 20 10月, 2017 1 次提交
    • A
      x86/kasan: Use the same shadow offset for 4- and 5-level paging · 12a8cc7f
      Andrey Ryabinin 提交于
      We are going to support boot-time switching between 4- and 5-level
      paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
      for different paging modes: the constant is passed to gcc to generate
      code and cannot be changed at runtime.
      
      This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
      for both 4- and 5-level paging.
      
      For 5-level paging it means that shadow memory region is not aligned to
      PGD boundary anymore and we have to handle unaligned parts of the region
      properly.
      
      In addition, we have to exclude paravirt code from KASAN instrumentation
      as we now use set_pgd() before KASAN is fully ready.
      
      [kirill.shutemov@linux.intel.com: clenaup, changelog message]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      12a8cc7f
  26. 14 10月, 2017 1 次提交