1. 07 9月, 2017 6 次提交
    • A
      x86/mm: Document how CR4.PCIDE restore works · 1c9fe440
      Andy Lutomirski 提交于
      While debugging a problem, I thought that using
      cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be
      helpful.  It turns out to be counterproductive.
      
      Add a comment documenting how this works.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c9fe440
    • A
      x86/mm: Reinitialize TLB state on hotplug and resume · 72c0098d
      Andy Lutomirski 提交于
      When Linux brings a CPU down and back up, it switches to init_mm and then
      loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
      of masking off the ASID bits in CR3.
      
      This can result in some confusion in the TLB handling code.  If we
      bring a CPU down and back up with any ASID other than 0, we end up
      with the wrong ASID active on the CPU after resume.  This could
      cause our internal state to become corrupt, although major
      corruption is unlikely because init_mm doesn't have any user pages.
      More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
      in the next context switch.  The result of *that* is a failure to
      resume from suspend with probability 1 - 1/6^(cpus-1).
      
      Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NJiri Kosina <jikos@kernel.org>
      Fixes: 10af6235 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72c0098d
    • R
      mm,fork: introduce MADV_WIPEONFORK · d2cd9ede
      Rik van Riel 提交于
      Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
      in the child process after fork.  This differs from MADV_DONTFORK in one
      important way.
      
      If a child process accesses memory that was MADV_WIPEONFORK, it will get
      zeroes.  The address ranges are still valid, they are just empty.
      
      If a child process accesses memory that was MADV_DONTFORK, it will get a
      segmentation fault, since those address ranges are no longer valid in
      the child after fork.
      
      Since MADV_DONTFORK also seems to be used to allow very large programs
      to fork in systems with strict memory overcommit restrictions, changing
      the semantics of MADV_DONTFORK might break existing programs.
      
      MADV_WIPEONFORK only works on private, anonymous VMAs.
      
      The use case is libraries that store or cache information, and want to
      know that they need to regenerate it in the child process after fork.
      
      Examples of this would be:
       - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
         check, which is too slow without a PID cache)
       - PKCS#11 API reinitialization check (mandated by specification)
       - glibc's upcoming PRNG (reseed after fork)
       - OpenSSL PRNG (reseed after fork)
      
      The security benefits of a forking server having a re-inialized PRNG in
      every child process are pretty obvious.  However, due to libraries
      having all kinds of internal state, and programs getting compiled with
      many different versions of each library, it is unreasonable to expect
      calling programs to re-initialize everything manually after fork.
      
      A further complication is the proliferation of clone flags, programs
      bypassing glibc's functions to call clone directly, and programs calling
      unshare, causing the glibc pthread_atfork hook to not get called.
      
      It would be better to have the kernel take care of this automatically.
      
      The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
      MADV_WIPEONFORK.
      
      This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
      
          https://man.openbsd.org/minherit.2
      
      [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
      Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Reported-by: NColm MacCártaigh <colm@allcosts.net>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2cd9ede
    • R
      x86,mpx: make mpx depend on x86-64 to free up VMA flag · df3735c5
      Rik van Riel 提交于
      Patch series "mm,fork,security: introduce MADV_WIPEONFORK", v4.
      
      If a child process accesses memory that was MADV_WIPEONFORK, it will get
      zeroes.  The address ranges are still valid, they are just empty.
      
      If a child process accesses memory that was MADV_DONTFORK, it will get a
      segmentation fault, since those address ranges are no longer valid in
      the child after fork.
      
      Since MADV_DONTFORK also seems to be used to allow very large programs
      to fork in systems with strict memory overcommit restrictions, changing
      the semantics of MADV_DONTFORK might break existing programs.
      
      The use case is libraries that store or cache information, and want to
      know that they need to regenerate it in the child process after fork.
      
      Examples of this would be:
       - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
         check, which is too slow without a PID cache)
       - PKCS#11 API reinitialization check (mandated by specification)
       - glibc's upcoming PRNG (reseed after fork)
       - OpenSSL PRNG (reseed after fork)
      
      The security benefits of a forking server having a re-inialized PRNG in
      every child process are pretty obvious.  However, due to libraries
      having all kinds of internal state, and programs getting compiled with
      many different versions of each library, it is unreasonable to expect
      calling programs to re-initialize everything manually after fork.
      
      A further complication is the proliferation of clone flags, programs
      bypassing glibc's functions to call clone directly, and programs calling
      unshare, causing the glibc pthread_atfork hook to not get called.
      
      It would be better to have the kernel take care of this automatically.
      
      The patchset also adds MADV_KEEPONFORK, to undo the effects of a prior
      MADV_WIPEONFORK.
      
      This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
      
          https://man.openbsd.org/minherit.2
      
      This patch (of 2):
      
      MPX only seems to be available on 64 bit CPUs, starting with Skylake and
      Goldmont.  Move VM_MPX into the 64 bit only portion of vma->vm_flags, in
      order to free up a VMA flag.
      
      Link: http://lkml.kernel.org/r/20170811212829.29186-2-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Colm MacCártaigh <colm@allcosts.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df3735c5
    • M
      mm: arch: consolidate mmap hugetlb size encodings · aafd4562
      Mike Kravetz 提交于
      A non-default huge page size can be encoded in the flags argument of the
      mmap system call.  The definitions for these encodings are in arch
      specific header files.  However, all architectures use the same values.
      
      Consolidate all the definitions in the primary user header file
      (uapi/linux/mman.h).  Include definitions for all known huge page sizes.
      Use the generic encoding definitions in hugetlb_encode.h as the basis
      for these definitions.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aafd4562
    • D
      metag/numa: remove the unused parent_node() macro · f0cd3406
      Dou Liyang 提交于
      Commit a7be6e5a ("mm: drop useless local parameters of
      __register_one_node()") removes the last user of parent_node().
      
      The parent_node() macro in METAG architecture is unnecessary.
      
      Remove it for cleanup.
      
      Link: http://lkml.kernel.org/r/1501076076-1974-4-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0cd3406
  2. 05 9月, 2017 10 次提交
  3. 04 9月, 2017 3 次提交
    • V
      net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f
      Varsha Rao 提交于
      This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
      are no longer required. Replace _ASSERT() macros with WARN_ON().
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9efdb14f
    • C
      powerpc/xive: Fix section __init warning · 265601f0
      Cédric Le Goater 提交于
      xive_spapr_init() is called from a __init routine and calls __init
      routines.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      265601f0
    • P
      powerpc: Fix kernel crash in emulation of vector loads and stores · 4716e488
      Paul Mackerras 提交于
      Commit 350779a2 ("powerpc: Handle most loads and stores in
      instruction emulation code", 2017-08-30) changed the register usage
      in get_vr and put_vr with the aim of leaving the register number in
      r3 untouched on return.  Unfortunately, r6 was not a good choice, as
      the callers as of 350779a2 store a MSR value in r6.  Then, in
      commit c22435a5 ("powerpc: Emulate FP/vector/VSX loads/stores
      correctly when regs not live", 2017-08-30), the saving and restoring
      of the MSR got moved into get_vr and put_vr.  Either way, the effect
      is that we put a value in MSR that only has the 0x3f8 bits non-zero,
      meaning that we are switching to 32-bit mode.  That leads to a crash
      like this:
      
      Unable to handle kernel paging request for instruction fetch
      Faulting instruction address: 0x0007bea0
      Oops: Kernel access of bad area, sig: 11 [#12]
      LE SMP NR_CPUS=2048 NUMA PowerNV
      Modules linked in: vmx_crypto binfmt_misc ip_tables x_tables autofs4 crc32c_vpmsum
      CPU: 6 PID: 32659 Comm: trashy_testcase Tainted: G      D         4.13.0-rc2-00313-gf3026f57e6ed-dirty #23
      task: c000000f1bb9e780 task.stack: c000000f1ba98000
      NIP:  000000000007bea0 LR: c00000000007b054 CTR: c00000000007be70
      REGS: c000000f1ba9b960 TRAP: 0400   Tainted: G      D          (4.13.0-rc2-00313-gf3026f57e6ed-dirty)
      MSR:  10000000400010a1 <HV,ME,IR,LE>  CR: 48000228  XER: 00000000
      CFAR: c00000000007be74 SOFTE: 1
      GPR00: c00000000007b054 c000000f1ba9bbe0 c000000000e6e000 000000000000001d
      GPR04: c000000f1ba9bc00 c00000000007be70 00000000000000e8 9000000002009033
      GPR08: 0000000002000000 100000000282f033 000000000b0a0900 0000000000001009
      GPR12: 0000000000000000 c00000000fd42100 0706050303020100 a5a5a5a5a5a5a5a5
      GPR16: 2e2e2e2e2e2de70c 2e2e2e2e2e2e2e2d 0000000000ff00ff 0606040202020000
      GPR20: 000000000000005b ffffffffffffffff 0000000003020100 0000000000000000
      GPR24: c000000f1ab90020 c000000f1ba9bc00 0000000000000001 0000000000000001
      GPR28: c000000f1ba9bc90 c000000f1ba9bea0 000000000b0a0908 0000000000000001
      NIP [000000000007bea0] 0x7bea0
      LR [c00000000007b054] emulate_loadstore+0x1044/0x1280
      Call Trace:
      [c000000f1ba9bbe0] [c000000000076b80] analyse_instr+0x60/0x34f0 (unreliable)
      [c000000f1ba9bc70] [c00000000007b7ec] emulate_step+0x23c/0x544
      [c000000f1ba9bce0] [c000000000053424] arch_uprobe_skip_sstep+0x24/0x40
      [c000000f1ba9bd00] [c00000000024b2f8] uprobe_notify_resume+0x598/0xba0
      [c000000f1ba9be00] [c00000000001c284] do_notify_resume+0xd4/0xf0
      [c000000f1ba9be30] [c00000000000bd44] ret_from_except_lite+0x70/0x74
      Instruction dump:
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      ---[ end trace a7ae7a7f3e0256b5 ]---
      
      To fix this, we just revert to using r3 as before, since the callers
      don't rely on r3 being left unmodified.
      
      Fortunately, this can't be triggered by a misaligned load or store,
      because vector loads and stores truncate misaligned addresses rather
      than taking an alignment interrupt.  It can be triggered using
      uprobes.
      
      Fixes: 350779a2 ("powerpc: Handle most loads and stores in instruction emulation code")
      Reported-by: NAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4716e488
  4. 02 9月, 2017 9 次提交
  5. 01 9月, 2017 12 次提交