1. 17 7月, 2019 4 次提交
  2. 15 7月, 2019 11 次提交
  3. 13 7月, 2019 2 次提交
    • A
      mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options · 6471384a
      Alexander Potapenko 提交于
      Patch series "add init_on_alloc/init_on_free boot options", v10.
      
      Provide init_on_alloc and init_on_free boot options.
      
      These are aimed at preventing possible information leaks and making the
      control-flow bugs that depend on uninitialized values more deterministic.
      
      Enabling either of the options guarantees that the memory returned by the
      page allocator and SL[AU]B is initialized with zeroes.  SLOB allocator
      isn't supported at the moment, as its emulation of kmem caches complicates
      handling of SLAB_TYPESAFE_BY_RCU caches correctly.
      
      Enabling init_on_free also guarantees that pages and heap objects are
      initialized right after they're freed, so it won't be possible to access
      stale data by using a dangling pointer.
      
      As suggested by Michal Hocko, right now we don't let the heap users to
      disable initialization for certain allocations.  There's not enough
      evidence that doing so can speed up real-life cases, and introducing ways
      to opt-out may result in things going out of control.
      
      This patch (of 2):
      
      The new options are needed to prevent possible information leaks and make
      control-flow bugs that depend on uninitialized values more deterministic.
      
      This is expected to be on-by-default on Android and Chrome OS.  And it
      gives the opportunity for anyone else to use it under distros too via the
      boot args.  (The init_on_free feature is regularly requested by folks
      where memory forensics is included in their threat models.)
      
      init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
      objects with zeroes.  Initialization is done at allocation time at the
      places where checks for __GFP_ZERO are performed.
      
      init_on_free=1 makes the kernel initialize freed pages and heap objects
      with zeroes upon their deletion.  This helps to ensure sensitive data
      doesn't leak via use-after-free accesses.
      
      Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
      returns zeroed memory.  The two exceptions are slab caches with
      constructors and SLAB_TYPESAFE_BY_RCU flag.  Those are never
      zero-initialized to preserve their semantics.
      
      Both init_on_alloc and init_on_free default to zero, but those defaults
      can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
      CONFIG_INIT_ON_FREE_DEFAULT_ON.
      
      If either SLUB poisoning or page poisoning is enabled, those options take
      precedence over init_on_alloc and init_on_free: initialization is only
      applied to unpoisoned allocations.
      
      Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:
      
      hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
      hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
      
      Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
      Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
      Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
      Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
      
      The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
      is within the standard error.
      
      The new features are also going to pave the way for hardware memory
      tagging (e.g.  arm64's MTE), which will require both on_alloc and on_free
      hooks to set the tags for heap objects.  With MTE, tagging will have the
      same cost as memory initialization.
      
      Although init_on_free is rather costly, there are paranoid use-cases where
      in-memory data lifetime is desired to be minimized.  There are various
      arguments for/against the realism of the associated threat models, but
      given that we'll need the infrastructure for MTE anyway, and there are
      people who want wipe-on-free behavior no matter what the performance cost,
      it seems reasonable to include it in this series.
      
      [glider@google.com: v8]
        Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
      [glider@google.com: v9]
        Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
      [glider@google.com: v10]
        Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
      Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: Michal Hocko <mhocko@suse.cz>		[page and dmapool parts
      Acked-by: James Morris <jamorris@linux.microsoft.com>]
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Sandeep Patil <sspatil@android.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6471384a
    • V
      mm, debug_pagealloc: use a page type instead of page_ext flag · 3972f6bb
      Vlastimil Babka 提交于
      When debug_pagealloc is enabled, we currently allocate the page_ext
      array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag.  Now that
      we have the page_type field in struct page, we can use that instead, as
      guard pages are neither PageSlab nor mapped to userspace.  This reduces
      memory overhead when debug_pagealloc is enabled and there are no other
      features requiring the page_ext array.
      
      Link: http://lkml.kernel.org/r/20190603143451.27353-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3972f6bb
  4. 09 7月, 2019 1 次提交
  5. 03 7月, 2019 1 次提交
    • T
      x86/fsgsbase: Revert FSGSBASE support · 049331f2
      Thomas Gleixner 提交于
      The FSGSBASE series turned out to have serious bugs and there is still an
      open issue which is not fully understood yet.
      
      The confidence in those changes has become close to zero especially as the
      test cases which have been shipped with that series were obviously never
      run before sending the final series out to LKML.
      
        ./fsgsbase_64 >/dev/null
        Segmentation fault
      
      As the merge window is close, the only sane decision is to revert FSGSBASE
      support. The revert is necessary as this branch has been merged into
      perf/core already and rebasing all of that a few days before the merge
      window is not the most brilliant idea.
      
      I could definitely slap myself for not noticing the test case fail when
      merging that series, but TBH my expectations weren't that low back
      then. Won't happen again.
      
      Revert the following commits:
      539bca53 ("x86/entry/64: Fix and clean up paranoid_exit")
      2c7b5ac5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
      f987c955 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
      2032f1f9 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
      5bf0cab6 ("x86/entry/64: Document GSBASE handling in the paranoid path")
      708078f6 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
      79e1932f ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      1d07316b ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
      f60a83df ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
      1ab5f3f7 ("x86/process/64: Use FSBSBASE in switch_to() if available")
      a86b4625 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
      8b71340d ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
      b64ed19b ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      049331f2
  6. 01 7月, 2019 1 次提交
  7. 28 6月, 2019 2 次提交
  8. 22 6月, 2019 2 次提交
  9. 19 6月, 2019 1 次提交
    • N
      powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP · d909f910
      Nicholas Piggin 提交于
      This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
      page table functions.
      
      This enables huge (2MB and 1GB) ioremap mappings. I don't have a
      benchmark for this change, but huge vmap will be used by a later core
      kernel change to enable huge vmalloc memory mappings. This improves
      cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
      size dentry cache hash.
      
        Profiling git diff dTLB misses with a vanilla kernel:
      
        81.75%  git      [kernel.vmlinux]    [k] __d_lookup_rcu
         7.21%  git      [kernel.vmlinux]    [k] strncpy_from_user
         1.77%  git      [kernel.vmlinux]    [k] find_get_entry
         1.59%  git      [kernel.vmlinux]    [k] kmem_cache_free
      
                  40,168      dTLB-miss
             0.100342754 seconds time elapsed
      
        With powerpc huge vmalloc:
      
                   2,987      dTLB-miss
             0.095933138 seconds time elapsed
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d909f910
  10. 15 6月, 2019 7 次提交
  11. 11 6月, 2019 1 次提交
    • M
      docs: s390: convert docs to ReST and rename to *.rst · 8b4a503d
      Mauro Carvalho Chehab 提交于
      Convert all text files with s390 documentation to ReST format.
      
      Tried to preserve as much as possible the original document
      format. Still, some of the files required some work in order
      for it to be visible on both plain text and after converted
      to html.
      
      The conversion is actually:
        - add blank lines and identation in order to identify paragraphs;
        - fix tables markups;
        - add some lists markups;
        - mark literal blocks;
        - adjust title markups.
      
      At its new index.rst, let's add a :orphan: while this is not linked to
      the main index.rst file, in order to avoid build warnings.
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      8b4a503d
  12. 09 6月, 2019 2 次提交
  13. 30 5月, 2019 1 次提交
  14. 26 5月, 2019 2 次提交
  15. 19 5月, 2019 1 次提交
  16. 15 5月, 2019 1 次提交
    • W
      ipc: allow boot time extension of IPCMNI from 32k to 16M · 5ac893b8
      Waiman Long 提交于
      The maximum number of unique System V IPC identifiers was limited to
      32k.  That limit should be big enough for most use cases.
      
      However, there are some users out there requesting for more, especially
      those that are migrating from Solaris which uses 24 bits for unique
      identifiers.  To satisfy the need of those users, a new boot time kernel
      option "ipcmni_extend" is added to extend the IPCMNI value to 16M.  This
      is a 512X increase which should be big enough for users out there that
      need a large number of unique IPC identifier.
      
      The use of this new option will change the pattern of the IPC
      identifiers returned by functions like shmget(2).  An application that
      depends on such pattern may not work properly.  So it should only be
      used if the users really need more than 32k of unique IPC numbers.
      
      This new option does have the side effect of reducing the maximum number
      of unique sequence numbers from 64k down to 128.  So it is a trade-off.
      
      The computation of a new IPC id is not done in the performance critical
      path.  So a little bit of additional overhead shouldn't have any real
      performance impact.
      
      Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ac893b8