1. 16 1月, 2018 6 次提交
    • K
      usercopy: Allow strict enforcement of whitelists · 2d891fbc
      Kees Cook 提交于
      This introduces CONFIG_HARDENED_USERCOPY_FALLBACK to control the
      behavior of hardened usercopy whitelist violations. By default, whitelist
      violations will continue to WARN() so that any bad or missing usercopy
      whitelists can be discovered without being too disruptive.
      
      If this config is disabled at build time or a system is booted with
      "slab_common.usercopy_fallback=0", usercopy whitelists will BUG() instead
      of WARN(). This is useful for admins that want to use usercopy whitelists
      immediately.
      Suggested-by: NMatthew Garrett <mjg59@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      2d891fbc
    • K
      usercopy: WARN() on slab cache usercopy region violations · afcc90f8
      Kees Cook 提交于
      This patch adds checking of usercopy cache whitelisting, and is modified
      from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the
      last public patch of grsecurity/PaX based on my understanding of the
      code. Changes or omissions from the original code are mine and don't
      reflect the original grsecurity/PaX code.
      
      The SLAB and SLUB allocators are modified to WARN() on all copy operations
      in which the kernel heap memory being modified falls outside of the cache's
      defined usercopy region.
      
      Based on an earlier patch from David Windsor.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      afcc90f8
    • D
      usercopy: Prepare for usercopy whitelisting · 8eb8284b
      David Windsor 提交于
      This patch prepares the slab allocator to handle caches having annotations
      (useroffset and usersize) defining usercopy regions.
      
      This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
      whitelisting code in the last public patch of grsecurity/PaX based on
      my understanding of the code. Changes or omissions from the original
      code are mine and don't reflect the original grsecurity/PaX code.
      
      Currently, hardened usercopy performs dynamic bounds checking on slab
      cache objects. This is good, but still leaves a lot of kernel memory
      available to be copied to/from userspace in the face of bugs. To further
      restrict what memory is available for copying, this creates a way to
      whitelist specific areas of a given slab cache object for copying to/from
      userspace, allowing much finer granularity of access control. Slab caches
      that are never exposed to userspace can declare no whitelist for their
      objects, thereby keeping them unavailable to userspace via dynamic copy
      operations. (Note, an implicit form of whitelisting is the use of constant
      sizes in usercopy operations and get_user()/put_user(); these bypass
      hardened usercopy checks since these sizes cannot change at runtime.)
      
      To support this whitelist annotation, usercopy region offset and size
      members are added to struct kmem_cache. The slab allocator receives a
      new function, kmem_cache_create_usercopy(), that creates a new cache
      with a usercopy region defined, suitable for declaring spans of fields
      within the objects that get copied to/from userspace.
      
      In this patch, the default kmem_cache_create() marks the entire allocation
      as whitelisted, leaving it semantically unchanged. Once all fine-grained
      whitelists have been added (in subsequent patches), this will be changed
      to a usersize of 0, making caches created with kmem_cache_create() not
      copyable to/from userspace.
      
      After the entire usercopy whitelist series is applied, less than 15%
      of the slab cache memory remains exposed to potential usercopy bugs
      after a fresh boot:
      
      Total Slab Memory:           48074720
      Usercopyable Memory:          6367532  13.2%
               task_struct                    0.2%         4480/1630720
               RAW                            0.3%            300/96000
               RAWv6                          2.1%           1408/64768
               ext4_inode_cache               3.0%       269760/8740224
               dentry                        11.1%       585984/5273856
               mm_struct                     29.1%         54912/188448
               kmalloc-8                    100.0%          24576/24576
               kmalloc-16                   100.0%          28672/28672
               kmalloc-32                   100.0%          81920/81920
               kmalloc-192                  100.0%          96768/96768
               kmalloc-128                  100.0%        143360/143360
               names_cache                  100.0%        163840/163840
               kmalloc-64                   100.0%        167936/167936
               kmalloc-256                  100.0%        339968/339968
               kmalloc-512                  100.0%        350720/350720
               kmalloc-96                   100.0%        455616/455616
               kmalloc-8192                 100.0%        655360/655360
               kmalloc-1024                 100.0%        812032/812032
               kmalloc-4096                 100.0%        819200/819200
               kmalloc-2048                 100.0%      1310720/1310720
      
      After some kernel build workloads, the percentage (mainly driven by
      dentry and inode caches expanding) drops under 10%:
      
      Total Slab Memory:           95516184
      Usercopyable Memory:          8497452   8.8%
               task_struct                    0.2%         4000/1456000
               RAW                            0.3%            300/96000
               RAWv6                          2.1%           1408/64768
               ext4_inode_cache               3.0%     1217280/39439872
               dentry                        11.1%     1623200/14608800
               mm_struct                     29.1%         73216/251264
               kmalloc-8                    100.0%          24576/24576
               kmalloc-16                   100.0%          28672/28672
               kmalloc-32                   100.0%          94208/94208
               kmalloc-192                  100.0%          96768/96768
               kmalloc-128                  100.0%        143360/143360
               names_cache                  100.0%        163840/163840
               kmalloc-64                   100.0%        245760/245760
               kmalloc-256                  100.0%        339968/339968
               kmalloc-512                  100.0%        350720/350720
               kmalloc-96                   100.0%        563520/563520
               kmalloc-8192                 100.0%        655360/655360
               kmalloc-1024                 100.0%        794624/794624
               kmalloc-4096                 100.0%        819200/819200
               kmalloc-2048                 100.0%      1257472/1257472
      Signed-off-by: NDavid Windsor <dave@nullcore.net>
      [kees: adjust commit log, split out a few extra kmalloc hunks]
      [kees: add field names to function declarations]
      [kees: convert BUGs to WARNs and fail closed]
      [kees: add attack surface reduction analysis to commit log]
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NChristoph Lameter <cl@linux.com>
      8eb8284b
    • K
      usercopy: Include offset in hardened usercopy report · f4e6e289
      Kees Cook 提交于
      This refactors the hardened usercopy code so that failure reporting can
      happen within the checking functions instead of at the top level. This
      simplifies the return value handling and allows more details and offsets
      to be included in the report. Having the offset can be much more helpful
      in understanding hardened usercopy bugs.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      f4e6e289
    • K
      usercopy: Enhance and rename report_usercopy() · b394d468
      Kees Cook 提交于
      In preparation for refactoring the usercopy checks to pass offset to
      the hardened usercopy report, this renames report_usercopy() to the
      more accurate usercopy_abort(), marks it as noreturn because it is,
      adds a hopefully helpful comment for anyone investigating such reports,
      makes the function available to the slab allocators, and adds new "detail"
      and "offset" arguments.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      b394d468
    • K
      usercopy: Remove pointer from overflow report · 4f5e8386
      Kees Cook 提交于
      Using %p was already mostly useless in the usercopy overflow reports,
      so this removes it entirely to avoid confusion now that %p-hashing
      is enabled.
      
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      4f5e8386
  2. 30 11月, 2017 15 次提交
  3. 29 11月, 2017 1 次提交
  4. 28 11月, 2017 3 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
    • K
      mm, thp: Do not make pmd/pud dirty without a reason · 152e93af
      Kirill A. Shutemov 提交于
      Currently we make page table entries dirty all the time regardless of
      access type and don't even consider if the mapping is write-protected.
      The reasoning is that we don't really need dirty tracking on THP and
      making the entry dirty upfront may save some time on first write to the
      page.
      
      Unfortunately, such approach may result in false-positive
      can_follow_write_pmd() for huge zero page or read-only shmem file.
      
      Let's only make page dirty only if we about to write to the page anyway
      (as we do for small pages).
      
      I've restructured the code to make entry dirty inside
      maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
      write-protected.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      152e93af
    • K
      mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() · a8f97366
      Kirill A. Shutemov 提交于
      Currently, we unconditionally make page table dirty in touch_pmd().
      It may result in false-positive can_follow_write_pmd().
      
      We may avoid the situation, if we would only make the page table entry
      dirty if caller asks for write access -- FOLL_WRITE.
      
      The patch also changes touch_pud() in the same way.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8f97366
  5. 22 11月, 2017 1 次提交
    • K
      block/laptop_mode: Convert timers to use timer_setup() · bca237a5
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      bca237a5
  6. 20 11月, 2017 2 次提交
  7. 18 11月, 2017 8 次提交
  8. 16 11月, 2017 4 次提交