1. 02 12月, 2022 1 次提交
    • C
      powerpc/code-patching: Use temporary mm for Radix MMU · c28c15b6
      Christopher M. Riedl 提交于
      x86 supports the notion of a temporary mm which restricts access to
      temporary PTEs to a single CPU. A temporary mm is useful for situations
      where a CPU needs to perform sensitive operations (such as patching a
      STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
      said mappings to other CPUs. Another benefit is that other CPU TLBs do
      not need to be flushed when the temporary mm is torn down.
      
      Mappings in the temporary mm can be set in the userspace portion of the
      address-space.
      
      Interrupts must be disabled while the temporary mm is in use. HW
      breakpoints, which may have been set by userspace as watchpoints on
      addresses now within the temporary mm, are saved and disabled when
      loading the temporary mm. The HW breakpoints are restored when unloading
      the temporary mm. All HW breakpoints are indiscriminately disabled while
      the temporary mm is in use - this may include breakpoints set by perf.
      
      Use the `poking_init` init hook to prepare a temporary mm and patching
      address. Initialize the temporary mm using mm_alloc(). Choose a
      randomized patching address inside the temporary mm userspace address
      space. The patching address is randomized between PAGE_SIZE and
      DEFAULT_MAP_WINDOW-PAGE_SIZE.
      
      Bits of entropy with 64K page size on BOOK3S_64:
      
      	bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
      
      	PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
      	bits of entropy = log2(128TB / 64K)
      	bits of entropy = 31
      
      The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU
      operates - by default the space above DEFAULT_MAP_WINDOW is not
      available. Currently the Hash MMU does not use a temporary mm so
      technically this upper limit isn't necessary; however, a larger
      randomization range does not further "harden" this overall approach and
      future work may introduce patching with a temporary mm on Hash as well.
      
      Randomization occurs only once during initialization for each CPU as it
      comes online.
      
      The patching page is mapped with PAGE_KERNEL to set EAA[0] for the PTE
      which ignores the AMR (so no need to unlock/lock KUAP) according to
      PowerISA v3.0b Figure 35 on Radix.
      
      Based on x86 implementation:
      
      commit 4fc19708
      ("x86/alternatives: Initialize temporary mm for patching")
      
      and:
      
      commit b3fd8e83
      ("x86/alternatives: Use temporary mm for text poking")
      
      From: Benjamin Gray <bgray@linux.ibm.com>
      
      Synchronisation is done according to ISA 3.1B Book 3 Chapter 13
      "Synchronization Requirements for Context Alterations". Switching the mm
      is a change to the PID, which requires a CSI before and after the change,
      and a hwsync between the last instruction that performs address
      translation for an associated storage access.
      
      Instruction fetch is an associated storage access, but the instruction
      address mappings are not being changed, so it should not matter which
      context they use. We must still perform a hwsync to guard arbitrary
      prior code that may have accessed a userspace address.
      
      TLB invalidation is local and VA specific. Local because only this core
      used the patching mm, and VA specific because we only care that the
      writable mapping is purged. Leaving the other mappings intact is more
      efficient, especially when performing many code patches in a row (e.g.,
      as ftrace would).
      Signed-off-by: NChristopher M. Riedl <cmr@bluescreens.de>
      Signed-off-by: NBenjamin Gray <bgray@linux.ibm.com>
      [mpe: Use mm_alloc() per 107b6828a7cd ("x86/mm: Use mm_alloc() in poking_init()")]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20221109045112.187069-9-bgray@linux.ibm.com
      c28c15b6
  2. 30 11月, 2022 1 次提交
  3. 01 9月, 2022 1 次提交
  4. 19 5月, 2022 3 次提交
  5. 11 5月, 2022 2 次提交
  6. 08 5月, 2022 1 次提交
  7. 07 3月, 2022 1 次提交
    • M
      powerpc/code-patching: Pre-map patch area · 591b4b26
      Michael Ellerman 提交于
      Paul reported a warning with DEBUG_ATOMIC_SLEEP=y:
      
        BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256
        in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
        preempt_count: 0, expected: 0
        ...
        Call Trace:
          dump_stack_lvl+0xa0/0xec (unreliable)
          __might_resched+0x2f4/0x310
          kmem_cache_alloc+0x220/0x4b0
          __pud_alloc+0x74/0x1d0
          hash__map_kernel_page+0x2cc/0x390
          do_patch_instruction+0x134/0x4a0
          arch_jump_label_transform+0x64/0x78
          __jump_label_update+0x148/0x180
          static_key_enable_cpuslocked+0xd0/0x120
          static_key_enable+0x30/0x50
          check_kvm_guest+0x60/0x88
          pSeries_smp_probe+0x54/0xb0
          smp_prepare_cpus+0x3e0/0x430
          kernel_init_freeable+0x20c/0x43c
          kernel_init+0x30/0x1a0
          ret_from_kernel_thread+0x5c/0x64
      
      Peter pointed out that this is because do_patch_instruction() has
      disabled interrupts, but then map_patch_area() calls map_kernel_page()
      then hash__map_kernel_page() which does a sleeping memory allocation.
      
      We only see the warning in KVM guests with SMT enabled, which is not
      particularly common, or on other platforms if CONFIG_KPROBES is
      disabled, also not common. The reason we don't see it in most
      configurations is that another path that happens to have interrupts
      enabled has allocated the required page tables for us, eg. there's a
      path in kprobes init that does that. That's just pure luck though.
      
      As Christophe suggested, the simplest solution is to do a dummy
      map/unmap when we initialise the patching, so that any required page
      table levels are pre-allocated before the first call to
      do_patch_instruction(). This works because the unmap doesn't free any
      page tables that were allocated by the map, it just clears the PTE,
      leaving the page table levels there for the next map.
      Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Debugged-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220223015821.473097-1-mpe@ellerman.id.au
      591b4b26
  8. 23 12月, 2021 11 次提交
  9. 09 12月, 2021 1 次提交
  10. 29 11月, 2021 1 次提交
  11. 25 11月, 2021 1 次提交
  12. 07 10月, 2021 1 次提交
  13. 21 6月, 2021 1 次提交
  14. 16 6月, 2021 4 次提交
  15. 21 4月, 2021 1 次提交
  16. 26 3月, 2021 1 次提交
  17. 15 9月, 2020 1 次提交
  18. 26 7月, 2020 1 次提交
  19. 10 6月, 2020 1 次提交
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  20. 05 6月, 2020 1 次提交
  21. 26 5月, 2020 1 次提交
  22. 18 5月, 2020 3 次提交