1. 31 8月, 2021 1 次提交
    • S
      kexec: Add quick kexec support for kernel · 742670c6
      Sang Yan 提交于
      hulk inclusion
      category: feature
      bugzilla: 48159
      CVE: N/A
      
      ------------------------------
      
      In normal kexec, relocating kernel may cost 5 ~ 10 seconds, to
      copy all segments from vmalloced memory to kernel boot memory,
      because of disabled mmu.
      
      We introduce quick kexec to save time of copying memory as above,
      just like kdump(kexec on crash), by using reserved memory
      "Quick Kexec".
      
      To enable it, we should reserve memory and setup quick_kexec_res.
      
      Constructing quick kimage as the same as crash kernel,
      then simply copy all segments of kimage to reserved memroy.
      
      We also add this support in syscall kexec_load using flags
      of KEXEC_QUICK.
      Signed-off-by: NSang Yan <sangyan@huawei.com>
      Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      742670c6
  2. 28 7月, 2021 1 次提交
  3. 14 7月, 2021 2 次提交
    • K
      mm: speedup mremap on 1GB or larger regions · 68e04270
      Kalesh Singh 提交于
      mainline inclusion
      from mainline-v5.11-rc1
      commit c49dd340
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
      CVE: NA
      
      -------------------------------------------------
      
      Android needs to move large memory regions for garbage collection.  The GC
      requires moving physical pages of multi-gigabyte heap using mremap.
      During this move, the application threads have to be paused for
      correctness.  It is critical to keep this pause as short as possible to
      avoid jitters during user interaction.
      
      Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if
      the source and destination addresses are PUD-aligned.  For
      CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD
      entries, since the PUD entry is “folded back” onto the PGD entry.  Add
      HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't
      supported/tested can turn this off by not selecting the config.
      
      Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.comSigned-off-by: NKalesh Singh <kaleshsingh@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: Hassan Naveed <hnaveed@wavecomp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Minchan Kim <minchan@google.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: NChen Wandun <chenwandun@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      68e04270
    • N
      mm/vmalloc: hugepage vmalloc mappings · 7954687a
      Nicholas Piggin 提交于
      mainline inclusion
      from mainline-5.13-rc1
      commit 121e6f32
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZGKZ
      CVE: NA
      
      -------------------------------------------------
      
      Support huge page vmalloc mappings.  Config option HAVE_ARCH_HUGE_VMALLOC
      enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
      supports PMD sized vmap mappings.
      
      vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
      larger, and fall back to small pages if that was unsuccessful.
      
      Architectures must ensure that any arch specific vmalloc allocations that
      require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx)
      use the VM_NOHUGE flag to inhibit larger mappings.
      
      This can result in more internal fragmentation and memory overhead for a
      given allocation, an option nohugevmalloc is added to disable at boot.
      
      [colin.king@canonical.com: fix read of uninitialized pointer area]
        Link: https://lkml.kernel.org/r/20210318155955.18220-1-colin.king@canonical.com
      
      Link: https://lkml.kernel.org/r/20210317062402.533919-14-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Conflicts:
      	mm/page_alloc.c
      Signed-off-by: NChen Wandun <chenwandun@huawei.com>
      Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7954687a
  4. 06 7月, 2021 1 次提交
  5. 08 2月, 2021 1 次提交
  6. 27 1月, 2021 1 次提交
  7. 12 1月, 2021 1 次提交
  8. 01 12月, 2020 1 次提交
  9. 09 10月, 2020 1 次提交
  10. 16 9月, 2020 1 次提交
    • N
      mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race · d53c3dfb
      Nicholas Piggin 提交于
      Reading and modifying current->mm and current->active_mm and switching
      mm should be done with irqs off, to prevent races seeing an intermediate
      state.
      
      This is similar to commit 38cf307c ("mm: fix kthread_use_mm() vs TLB
      invalidate"). At exec-time when the new mm is activated, the old one
      should usually be single-threaded and no longer used, unless something
      else is holding an mm_users reference (which may be possible).
      
      Absent other mm_users, there is also a race with preemption and lazy tlb
      switching. Consider the kernel_execve case where the current thread is
      using a lazy tlb active mm:
      
        call_usermodehelper()
          kernel_execve()
            old_mm = current->mm;
            active_mm = current->active_mm;
            *** preempt *** -------------------->  schedule()
                                                     prev->active_mm = NULL;
                                                     mmdrop(prev active_mm);
                                                   ...
                            <--------------------  schedule()
            current->mm = mm;
            current->active_mm = mm;
            if (!old_mm)
                mmdrop(active_mm);
      
      If we switch back to the kernel thread from a different mm, there is a
      double free of the old active_mm, and a missing free of the new one.
      
      Closing this race only requires interrupts to be disabled while ->mm
      and ->active_mm are being switched, but the TLB problem requires also
      holding interrupts off over activate_mm. Unfortunately not all archs
      can do that yet, e.g., arm defers the switch if irqs are disabled and
      expects finish_arch_post_lock_switch() to be called to complete the
      flush; um takes a blocking lock in activate_mm().
      
      So as a first step, disable interrupts across the mm/active_mm updates
      to close the lazy tlb preempt race, and provide an arch option to
      extend that to activate_mm which allows architectures doing IPI based
      TLB shootdowns to close the second race.
      
      This is a bit ugly, but in the interest of fixing the bug and backporting
      before all architectures are converted this is a compromise.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200914045219.3736466-2-npiggin@gmail.com
      d53c3dfb
  11. 09 9月, 2020 1 次提交
  12. 01 9月, 2020 3 次提交
  13. 06 8月, 2020 1 次提交
  14. 24 7月, 2020 1 次提交
    • T
      entry: Provide generic syscall entry functionality · 142781e1
      Thomas Gleixner 提交于
      On syscall entry certain work needs to be done:
      
         - Establish state (lockdep, context tracking, tracing)
         - Conditional work (ptrace, seccomp, audit...)
      
      This code is needlessly duplicated and  different in all
      architectures.
      
      Provide a generic version based on the x86 implementation which has all the
      RCU and instrumentation bits right.
      
      As interrupt/exception entry from user space needs parts of the same
      functionality, provide a function for this as well.
      
      syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be
      called right after the low level ASM entry. The calling code must be
      non-instrumentable. After the functions returns state is correct and the
      subsequent functions can be instrumented.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Link: https://lkml.kernel.org/r/20200722220519.513463269@linutronix.de
      142781e1
  15. 07 7月, 2020 1 次提交
  16. 05 7月, 2020 1 次提交
  17. 27 6月, 2020 1 次提交
  18. 14 6月, 2020 1 次提交
    • M
      treewide: replace '---help---' in Kconfig files with 'help' · a7f7f624
      Masahiro Yamada 提交于
      Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over
      '---help---'"), the number of '---help---' has been gradually
      decreasing, but there are still more than 2400 instances.
      
      This commit finishes the conversion. While I touched the lines,
      I also fixed the indentation.
      
      There are a variety of indentation styles found.
      
        a) 4 spaces + '---help---'
        b) 7 spaces + '---help---'
        c) 8 spaces + '---help---'
        d) 1 space + 1 tab + '---help---'
        e) 1 tab + '---help---'    (correct indentation)
        f) 1 tab + 1 space + '---help---'
        g) 1 tab + 2 spaces + '---help---'
      
      In order to convert all of them to 1 tab + 'help', I ran the
      following commend:
      
        $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      a7f7f624
  19. 19 5月, 2020 1 次提交
  20. 15 5月, 2020 2 次提交
    • S
      scs: Disable when function graph tracing is enabled · ddc9863e
      Sami Tolvanen 提交于
      The graph tracer hooks returns by modifying frame records on the
      (regular) stack, but with SCS the return address is taken from the
      shadow stack, and the value in the frame record has no effect. As we
      don't currently have a mechanism to determine the corresponding slot
      on the shadow stack (and to pass this through the ftrace
      infrastructure), for now let's disable SCS when the graph tracer is
      enabled.
      
      With SCS the return address is taken from the shadow stack and the
      value in the frame record has no effect. The mcount based graph tracer
      hooks returns by modifying frame records on the (regular) stack, and
      thus is not compatible. The patchable-function-entry graph tracer
      used for DYNAMIC_FTRACE_WITH_REGS modifies the LR before it is saved
      to the shadow stack, and is compatible.
      
      Modifying the mcount based graph tracer to work with SCS would require
      a mechanism to determine the corresponding slot on the shadow stack
      (and to pass this through the ftrace infrastructure), and we expect
      that everyone will eventually move to the patchable-function-entry
      based graph tracer anyway, so for now let's disable SCS when the
      mcount-based graph tracer is enabled.
      
      SCS and patchable-function-entry are both supported from LLVM 10.x.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      ddc9863e
    • S
      scs: Add support for Clang's Shadow Call Stack (SCS) · d08b9f0c
      Sami Tolvanen 提交于
      This change adds generic support for Clang's Shadow Call Stack,
      which uses a shadow stack to protect return addresses from being
      overwritten by an attacker. Details are available here:
      
        https://clang.llvm.org/docs/ShadowCallStack.html
      
      Note that security guarantees in the kernel differ from the ones
      documented for user space. The kernel must store addresses of
      shadow stacks in memory, which means an attacker capable reading
      and writing arbitrary memory may be able to locate them and hijack
      control flow by modifying the stacks.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      [will: Numerous cosmetic changes]
      Signed-off-by: NWill Deacon <will@kernel.org>
      d08b9f0c
  21. 13 5月, 2020 1 次提交
  22. 16 3月, 2020 3 次提交
  23. 06 3月, 2020 1 次提交
  24. 14 2月, 2020 1 次提交
    • F
      context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ · 490f561b
      Frederic Weisbecker 提交于
      A few archs (x86, arm, arm64) don't rely anymore on TIF_NOHZ to call
      into context tracking on user entry/exit but instead use static keys
      (or not) to optimize those calls. Ideally every arch should migrate to
      that behaviour in the long run.
      
      Settle a config option to let those archs remove their TIF_NOHZ
      definitions.
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      490f561b
  25. 04 2月, 2020 6 次提交
  26. 05 12月, 2019 1 次提交
  27. 02 12月, 2019 1 次提交
  28. 25 11月, 2019 1 次提交
  29. 23 11月, 2019 1 次提交