1. 09 7月, 2021 1 次提交
  2. 06 5月, 2021 1 次提交
    • S
      x86/mm: track linear mapping split events · 575299ea
      Saravanan D 提交于
      To help with debugging the sluggishness caused by TLB miss/reload, we
      introduce monotonic hugepage [direct mapped] split event counts since
      system state: SYSTEM_RUNNING to be displayed as part of /proc/vmstat in
      x86 servers
      
      The lifetime split event information will be displayed at the bottom of
      /proc/vmstat
        ....
        swap_ra 0
        swap_ra_hit 0
        direct_map_level2_splits 94
        direct_map_level3_splits 4
        nr_unstable 0
        ....
      
      One of the many lasting sources of direct hugepage splits is kernel
      tracing (kprobes, tracepoints).
      
      Note that the kernel's code segment [512 MB] points to the same physical
      addresses that have been already mapped in the kernel's direct mapping
      range.
      
      Source : Documentation/x86/x86_64/mm.rst
      
      When we enable kernel tracing, the kernel has to modify
      attributes/permissions of the text segment hugepages that are direct
      mapped causing them to split.
      
      Kernel's direct mapped hugepages do not coalesce back after split and
      remain in place for the remainder of the lifetime.
      
      An instance of direct page splits when we turn on dynamic kernel tracing
      ....
      cat /proc/vmstat | grep -i direct_map_level
      direct_map_level2_splits 784
      direct_map_level3_splits 12
      bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ [pid, comm] =
      count(); }'
      cat /proc/vmstat | grep -i
      direct_map_level
      direct_map_level2_splits 789
      direct_map_level3_splits 12
      ....
      
      Link: https://lkml.kernel.org/r/20210218235744.1040634-1-saravanand@fb.comSigned-off-by: NSaravanan D <saravanand@fb.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      575299ea
  3. 18 3月, 2021 1 次提交
    • I
      x86: Fix various typos in comments · d9f6e12f
      Ingo Molnar 提交于
      Fix ~144 single-word typos in arch/x86/ code comments.
      
      Doing this in a single commit should reduce the churn.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      d9f6e12f
  4. 16 12月, 2020 2 次提交
    • M
      arch, mm: make kernel_page_present() always available · 32a0de88
      Mike Rapoport 提交于
      For architectures that enable ARCH_HAS_SET_MEMORY having the ability to
      verify that a page is mapped in the kernel direct map can be useful
      regardless of hibernation.
      
      Add RISC-V implementation of kernel_page_present(), update its forward
      declarations and stubs to be a part of set_memory API and remove ugly
      ifdefery in inlcude/linux/mm.h around current declarations of
      kernel_page_present().
      
      Link: https://lkml.kernel.org/r/20201109192128.960-5-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32a0de88
    • M
      arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC · 5d6ad668
      Mike Rapoport 提交于
      The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
      never fail.  With this assumption is wouldn't be safe to allow general
      usage of this function.
      
      Moreover, some architectures that implement __kernel_map_pages() have this
      function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
      pages when page allocation debugging is disabled at runtime.
      
      As all the users of __kernel_map_pages() were converted to use
      debug_pagealloc_map_pages() it is safe to make it available only when
      DEBUG_PAGEALLOC is set.
      
      Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6ad668
  5. 18 9月, 2020 1 次提交
  6. 30 6月, 2020 1 次提交
    • Q
      x86/mm/pat: Mark an intentional data race · cb38f820
      Qian Cai 提交于
      cpa_4k_install could be accessed concurrently as noticed by KCSAN,
      
      read to 0xffffffffaa59a000 of 8 bytes by interrupt on cpu 7:
      cpa_inc_4k_install arch/x86/mm/pat/set_memory.c:131 [inline]
      __change_page_attr+0x10cf/0x1840 arch/x86/mm/pat/set_memory.c:1514
      __change_page_attr_set_clr+0xce/0x490 arch/x86/mm/pat/set_memory.c:1636
      __set_pages_np+0xc4/0xf0 arch/x86/mm/pat/set_memory.c:2148
      __kernel_map_pages+0xb0/0xc8 arch/x86/mm/pat/set_memory.c:2178
      kernel_map_pages include/linux/mm.h:2719 [inline] <snip>
      
      write to 0xffffffffaa59a000 of 8 bytes by task 1 on cpu 6:
      cpa_inc_4k_install arch/x86/mm/pat/set_memory.c:131 [inline]
      __change_page_attr+0x10ea/0x1840 arch/x86/mm/pat/set_memory.c:1514
      __change_page_attr_set_clr+0xce/0x490 arch/x86/mm/pat/set_memory.c:1636
      __set_pages_p+0xc4/0xf0 arch/x86/mm/pat/set_memory.c:2129
      __kernel_map_pages+0x2e/0xc8 arch/x86/mm/pat/set_memory.c:2176
      kernel_map_pages include/linux/mm.h:2719 [inline] <snip>
      
      Both accesses are due to the same "cpa_4k_install++" in
      cpa_inc_4k_install. A data race here could be potentially undesirable:
      depending on compiler optimizations or how x86 executes a non-LOCK'd
      increment, it may lose increments, corrupt the counter, etc. Since this
      counter only seems to be used for printing some stats, this data race
      itself is unlikely to cause harm to the system though. Thus, mark this
      intentional data race using the data_race() marco.
      Suggested-by: NMacro Elver <elver@google.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      cb38f820
  7. 01 5月, 2020 1 次提交
  8. 26 4月, 2020 1 次提交
  9. 23 4月, 2020 1 次提交
  10. 11 4月, 2020 1 次提交
  11. 27 3月, 2020 1 次提交
  12. 20 1月, 2020 1 次提交
  13. 10 12月, 2019 4 次提交
  14. 12 11月, 2019 1 次提交
  15. 03 9月, 2019 4 次提交
  16. 30 8月, 2019 1 次提交
    • T
      x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text · 7af01450
      Thomas Gleixner 提交于
      ftrace does not use text_poke() for enabling trace functionality. It uses
      its own mechanism and flips the whole kernel text to RW and back to RO.
      
      The CPA rework removed a loop based check of 4k pages which tried to
      preserve a large page by checking each 4k page whether the change would
      actually cover all pages in the large page.
      
      This resulted in endless loops for nothing as in testing it turned out that
      it actually never preserved anything. Of course testing missed to include
      ftrace, which is the one and only case which benefitted from the 4k loop.
      
      As a consequence enabling function tracing or ftrace based kprobes results
      in a full 4k split of the kernel text, which affects iTLB performance.
      
      The kernel RO protection is the only valid case where this can actually
      preserve large pages.
      
      All other static protections (RO data, data NX, PCI, BIOS) are truly
      static.  So a conflict with those protections which results in a split
      should only ever happen when a change of memory next to a protected region
      is attempted. But these conflicts are rightfully splitting the large page
      to preserve the protected regions. In fact a change to the protected
      regions itself is a bug and is warned about.
      
      Add an exception for the static protection check for kernel text RO when
      the to be changed region spawns a full large page which allows to preserve
      the large mappings. This also prevents the syslog to be spammed about CPA
      violations when ftrace is used.
      
      The exception needs to be removed once ftrace switched over to text_poke()
      which avoids the whole issue.
      
      Fixes: 585948f4 ("x86/mm/cpa: Avoid the 4k pages check completely")
      Reported-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de
      7af01450
  17. 21 5月, 2019 1 次提交
  18. 30 4月, 2019 2 次提交
    • R
      mm/hibernation: Make hibernation handle unmapped pages · d6332692
      Rick Edgecombe 提交于
      Make hibernate handle unmapped pages on the direct map when
      CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages
      to invalid configurations, so now hibernate should check if the pages have
      valid mappings and handle if they are unmapped when doing a hibernate
      save operation.
      
      Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y
      was configured. It does not appear to have a big hibernating performance
      impact. The speed of the saving operation before this change was measured
      as 819.02 MB/s, and after was measured at 813.32 MB/s.
      
      Before:
      [    4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s)
      
      After:
      [    4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s)
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190426001143.4983-16-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d6332692
    • R
      x86/mm/cpa: Add set_direct_map_*() functions · d253ca0c
      Rick Edgecombe 提交于
      Add two new functions set_direct_map_default_noflush() and
      set_direct_map_invalid_noflush() for setting the direct map alias for the
      page to its default valid permissions and to an invalid state that cannot
      be cached in a TLB, respectively. These functions do not flush the TLB.
      
      Note, __kernel_map_pages() does something similar but flushes the TLB and
      doesn't reset the permission bits to default on all architectures.
      
      Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether
      these have an actual implementation or a default empty one.
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190426001143.4983-15-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d253ca0c
  19. 07 3月, 2019 1 次提交
    • Q
      x86/mm: Remove unused variable 'old_pte' · 24c41220
      Qian Cai 提交于
      The commit 3a19109e ("x86/mm: Fix try_preserve_large_page() to
      handle large PAT bit") fixed try_preserve_large_page() by using the
      corresponding pud/pmd prot/pfn interfaces, but left a variable unused
      because it no longer used pte_pfn().
      
      Later, the commit 8679de09 ("x86/mm/cpa: Split, rename and clean up
      try_preserve_large_page()") renamed try_preserve_large_page() to
      __should_split_large_page(), but the unused variable remains.
      
      arch/x86/mm/pageattr.c: In function '__should_split_large_page':
      arch/x86/mm/pageattr.c:741:17: warning: variable 'old_pte' set but not
      used [-Wunused-but-set-variable]
      
      Fixes: 3a19109e ("x86/mm: Fix try_preserve_large_page() to handle large PAT bit")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@linux.intel.com
      Cc: luto@kernel.org
      Cc: peterz@infradead.org
      Cc: toshi.kani@hpe.com
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20190301152924.94762-1-cai@lca.pw
      24c41220
  20. 08 2月, 2019 1 次提交
  21. 18 12月, 2018 9 次提交
  22. 03 12月, 2018 1 次提交
    • I
      x86: Fix various typos in comments · a97673a1
      Ingo Molnar 提交于
      Go over arch/x86/ and fix common typos in comments,
      and a typo in an actual function argument name.
      
      No change in functionality intended.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a97673a1
  23. 30 11月, 2018 1 次提交
    • S
      x86/mm/pageattr: Introduce helper function to unmap EFI boot services · 7e0dabd3
      Sai Praneeth Prakhya 提交于
      Ideally, after kernel assumes control of the platform, firmware
      shouldn't access EFI boot services code/data regions. But, it's noticed
      that this is not so true in many x86 platforms. Hence, during boot,
      kernel reserves EFI boot services code/data regions [1] and maps [2]
      them to efi_pgd so that call to set_virtual_address_map() doesn't fail.
      After returning from set_virtual_address_map(), kernel frees the
      reserved regions [3] but they still remain mapped. Hence, introduce
      kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot
      services code/data regions.
      
      While at it modify kernel_map_pages_in_pgd() by:
      
      1. Adding __init modifier because it's always used *only* during boot.
      2. Add a warning if it's used after SMP is initialized because it uses
         __flush_tlb_all() which flushes mappings only on current CPU.
      
      Unmapping EFI boot services code/data regions will result in clearing
      PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
      handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.
      
      [1] efi_reserve_boot_services()
      [2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd()
      [3] efi_free_boot_services()
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Snowberg <eric.snowberg@oracle.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jon Hunter <jonathanh@nvidia.com>
      Cc: Julien Thierry <julien.thierry@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: YiFei Zhu <zhuyifei1999@gmail.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181129171230.18699-5-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7e0dabd3
  24. 31 10月, 2018 1 次提交