1. 12 4月, 2018 1 次提交
  2. 07 2月, 2018 1 次提交
  3. 03 11月, 2017 1 次提交
    • D
      regset: Add support for dynamically sized regsets · 27e64b4b
      Dave Martin 提交于
      Currently the regset API doesn't allow for the possibility that
      regsets (or at least, the amount of meaningful data in a regset)
      may change in size.
      
      In particular, this results in useless padding being added to
      coredumps if a regset's current size is smaller than its
      theoretical maximum size.
      
      This patch adds a get_size() function to struct user_regset.
      Individual regset implementations can implement this function to
      return the current size of the regset data.  A regset_size()
      function is added to provide callers with an abstract interface for
      determining the size of a regset without needing to know whether
      the regset is dynamically sized or not.
      
      The only affected user of this interface is the ELF coredump code:
      This patch ports ELF coredump to dump regsets with their actual
      size in the coredump.  This has no effect except for new regsets
      that are dynamically sized and provide a get_size() implementation.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      27e64b4b
  4. 11 9月, 2017 1 次提交
  5. 05 9月, 2017 1 次提交
  6. 17 8月, 2017 1 次提交
  7. 02 8月, 2017 1 次提交
    • K
      binfmt: Introduce secureexec flag · c425e189
      Kees Cook 提交于
      The bprm_secureexec hook can be moved earlier. Right now, it is called
      during create_elf_tables(), via load_binary(), via search_binary_handler(),
      via exec_binprm(). Nearly all (see exception below) state used by
      bprm_secureexec is created during the bprm_set_creds hook, called from
      prepare_binprm().
      
      For all LSMs (except commoncaps described next), only the first execution
      of bprm_set_creds takes any effect (they all check bprm->called_set_creds
      which prepare_binprm() sets after the first call to the bprm_set_creds
      hook).  However, all these LSMs also only do anything with bprm_secureexec
      when they detected a secure state during their first run of bprm_set_creds.
      Therefore, it is functionally identical to move the detection into
      bprm_set_creds, since the results from secureexec here only need to be
      based on the first call to the LSM's bprm_set_creds hook.
      
      The single exception is that the commoncaps secureexec hook also examines
      euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
      via prepare_binprm(), which can be called multiple times (e.g.
      binfmt_script, binfmt_misc), and may clear the euid/egid for the final
      load (i.e. the script interpreter). However, while commoncaps specifically
      ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
      prepare_binprm() may get called, it needs to base the secureexec decision
      on the final call to bprm_set_creds. As a result, it will need special
      handling.
      
      To begin this refactoring, this adds the secureexec flag to the bprm
      struct, and calls the secureexec hook during setup_new_exec(). This is
      safe since all the cred work is finished (and past the point of no return).
      This explicit call will be removed in later patches once the hook has been
      removed.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJohn Johansen <john.johansen@canonical.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Reviewed-by: NJames Morris <james.l.morris@oracle.com>
      c425e189
  8. 11 7月, 2017 2 次提交
    • K
      binfmt_elf: safely increment argv pointers · 67c6777a
      Kees Cook 提交于
      When building the argv/envp pointers, the envp is needlessly
      pre-incremented instead of just continuing after the argv pointers are
      finished.  In some (likely impossible) race where the strings could be
      changed from userspace between copy_strings() and here, it might be
      possible to confuse the envp position.  Instead, just use sp like
      everything else.
      
      Link: http://lkml.kernel.org/r/20170622173838.GA43308@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67c6777a
    • K
      binfmt_elf: use ELF_ET_DYN_BASE only for PIE · eab09532
      Kees Cook 提交于
      The ELF_ET_DYN_BASE position was originally intended to keep loaders
      away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
      /bin/cat" might cause the subsequent load of /bin/cat into where the
      loader had been loaded.)
      
      With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
      ELF_ET_DYN_BASE continued to be used since the kernel was only looking
      at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
      top 1/3rd of the TASK_SIZE, a substantial portion of the address space
      is unused.
      
      For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
      loaded above the mmap region.  This means they can be made to collide
      (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
      pathological stack regions.
      
      Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
      region in all cases, and will now additionally avoid programs falling
      back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
      if it would have collided with the stack, now it will fail to load
      instead of falling back to the mmap region).
      
      To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
      are loaded into the mmap region, leaving space available for either an
      ET_EXEC binary with a fixed location or PIE being loaded into mmap by
      the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
      which means architectures can now safely lower their values without risk
      of loaders colliding with their subsequently loaded programs.
      
      For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
      the entire 32-bit address space for 32-bit pointers.
      
      Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
      suggestions on how to implement this solution.
      
      Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR")
      Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eab09532
  9. 02 3月, 2017 4 次提交
  10. 23 2月, 2017 1 次提交
    • D
      powerpc: do not make the entire heap executable · 16e72e9b
      Denys Vlasenko 提交于
      On 32-bit powerpc the ELF PLT sections of binaries (built with
      --bss-plt, or with a toolchain which defaults to it) look like this:
      
        [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
        [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
        [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4
      
      Which results in an ELF load header:
      
        Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
        LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000
      
      This is all correct, the load region containing the PLT is marked as
      executable.  Note that the PLT starts at 0002b00c but the file mapping
      ends at 0002aff8, so the PLT falls in the 0 fill section described by
      the load header, and after a page boundary.
      
      Unfortunately the generic ELF loader ignores the X bit in the load
      headers when it creates the 0 filled non-file backed mappings.  It
      assumes all of these mappings are RW BSS sections, which is not the case
      for PPC.
      
      gcc/ld has an option (--secure-plt) to not do this, this is said to
      incur a small performance penalty.
      
      Currently, to support 32-bit binaries with PLT in BSS kernel maps
      *entire brk area* with executable rights for all binaries, even
      --secure-plt ones.
      
      Stop doing that.
      
      Teach the ELF loader to check the X bit in the relevant load header and
      create 0 filled anonymous mappings that are executable if the load
      header requests that.
      
      Test program showing the difference in /proc/$PID/maps:
      
      int main() {
      	char buf[16*1024];
      	char *p = malloc(123); /* make "[heap]" mapping appear */
      	int fd = open("/proc/self/maps", O_RDONLY);
      	int len = read(fd, buf, sizeof(buf));
      	write(1, buf, len);
      	printf("%p\n", p);
      	return 0;
      }
      
      Compiled using: gcc -mbss-plt -m32 -Os test.c -otest
      
      Unpatched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
      f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
      0x10690008
      
      Patched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                        ^^^^ this has changed
      f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
      0x10180008
      
      The patch was originally posted in 2012 by Jason Gunthorpe
      and apparently ignored:
      
      https://lkml.org/lkml/2012/9/30/138
      
      Lightly run-tested.
      
      Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.comSigned-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16e72e9b
  11. 01 2月, 2017 3 次提交
    • F
      fs/binfmt: Convert obsolete cputime type to nsecs · cd19c364
      Frederic Weisbecker 提交于
      Use the new nsec based cputime accessors as part of the whole cputime
      conversion from cputime_t to nsecs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-12-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd19c364
    • F
      sched/cputime: Convert task/group cputime to nsecs · 5613fda9
      Frederic Weisbecker 提交于
      Now that most cputime readers use the transition API which return the
      task cputime in old style cputime_t, we can safely store the cputime in
      nsecs. This will eventually make cputime statistics less opaque and more
      granular. Back and forth convertions between cputime_t and nsecs in order
      to deal with cputime_t random granularity won't be needed anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5613fda9
    • F
      sched/cputime: Introduce special task_cputime_t() API to return old-typed cputime · a1cecf2b
      Frederic Weisbecker 提交于
      This API returns a task's cputime in cputime_t in order to ease the
      conversion of cputime internals to use nsecs units instead. Blindly
      converting all cputime readers to use this API now will later let us
      convert more smoothly and step by step all these places to use the
      new nsec based cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1cecf2b
  12. 15 1月, 2017 1 次提交
    • D
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp 提交于
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
  13. 25 12月, 2016 1 次提交
  14. 13 12月, 2016 1 次提交
  15. 15 9月, 2016 1 次提交
  16. 01 9月, 2016 1 次提交
  17. 03 8月, 2016 1 次提交
    • K
      binfmt_elf: fix calculations for bss padding · 0036d1f7
      Kees Cook 提交于
      A double-bug exists in the bss calculation code, where an overflow can
      happen in the "last_bss - elf_bss" calculation, but vm_brk internally
      aligns the argument, underflowing it, wrapping back around safe.  We
      shouldn't depend on these bugs staying in sync, so this cleans up the
      bss padding handling to avoid the overflow.
      
      This moves the bss padzero() before the last_bss > elf_bss case, since
      the zero-filling of the ELF_PAGE should have nothing to do with the
      relationship of last_bss and elf_bss: any trailing portion should be
      zeroed, and a zero size is already handled by padzero().
      
      Then it handles the math on elf_bss vs last_bss correctly.  These need
      to both be ELF_PAGE aligned to get the comparison correct, since that's
      the expected granularity of the mappings.  Since elf_bss already had
      alignment-based padding happen in padzero(), the "start" of the new
      vm_brk() should be moved forward as done in the original code.  However,
      since the "end" of the vm_brk() area will already become PAGE_ALIGNed in
      vm_brk() then last_bss should get aligned here to avoid hiding it as a
      side-effect.
      
      Additionally makes a cosmetic change to the initial last_bss calculation
      so it's easier to read in comparison to the load_addr calculation above
      it (i.e.  the only difference is p_filesz vs p_memsz).
      
      Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Cc: Ismael Ripoll Ripoll <iripoll@upv.es>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0036d1f7
  18. 08 6月, 2016 1 次提交
  19. 28 5月, 2016 1 次提交
    • L
      mm: remove more IS_ERR_VALUE abuses · 5d22fc25
      Linus Torvalds 提交于
      The do_brk() and vm_brk() return value was "unsigned long" and returned
      the starting address on success, and an error value on failure.  The
      reasons are entirely historical, and go back to it basically behaving
      like the mmap() interface does.
      
      However, nobody actually wanted that interface, and it causes totally
      pointless IS_ERR_VALUE() confusion.
      
      What every single caller actually wants is just the simpler integer
      return of zero for success and negative error number on failure.
      
      So just convert to that much clearer and more common calling convention,
      and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d22fc25
  20. 24 5月, 2016 1 次提交
  21. 13 5月, 2016 1 次提交
  22. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  23. 28 2月, 2016 1 次提交
    • D
      mm: ASLR: use get_random_long() · 5ef11c35
      Daniel Cashman 提交于
      Replace calls to get_random_int() followed by a cast to (unsigned long)
      with calls to get_random_long().  Also address shifting bug which, in
      case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
      Signed-off-by: NDaniel Cashman <dcashman@android.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef11c35
  24. 20 1月, 2016 1 次提交
  25. 11 11月, 2015 2 次提交
    • M
      binfmt_elf: Correct `arch_check_elf's description · 54d15714
      Maciej W. Rozycki 提交于
      Correct `arch_check_elf's description, mistakenly copied and pasted from
      `arch_elf_pt_proc'.
      Signed-off-by: NMaciej W. Rozycki <macro@imgtec.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      54d15714
    • M
      binfmt_elf: Don't clobber passed executable's file header · b582ef5c
      Maciej W. Rozycki 提交于
      Do not clobber the buffer space passed from `search_binary_handler' and
      originally preloaded by `prepare_binprm' with the executable's file
      header by overwriting it with its interpreter's file header.  Instead
      keep the buffer space intact and directly use the data structure locally
      allocated for the interpreter's file header, fixing a bug introduced in
      2.1.14 with loadable module support (linux-mips.org commit beb11695
      [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
      Adjust the amount of data read from the interpreter's file accordingly.
      
      This was not an issue before loadable module support, because back then
      `load_elf_binary' was executed only once for a given ELF executable,
      whether the function succeeded or failed.
      
      With loadable module support supported and enabled, upon a failure of
      `load_elf_binary' -- which may for example be caused by architecture
      code rejecting an executable due to a missing hardware feature requested
      in the file header -- a module load is attempted and then the function
      reexecuted by `search_binary_handler'.  With the executable's file
      header replaced with its interpreter's file header the executable can
      then be erroneously accepted in this subsequent attempt.
      
      Cc: stable@vger.kernel.org # all the way back
      Signed-off-by: NMaciej W. Rozycki <macro@imgtec.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b582ef5c
  26. 10 11月, 2015 1 次提交
    • R
      coredump: add DAX filtering for ELF coredumps · 5037835c
      Ross Zwisler 提交于
      Add two new flags to the existing coredump mechanism for ELF files to
      allow us to explicitly filter DAX mappings.  This is desirable because
      DAX mappings, like hugetlb mappings, have the potential to be very
      large.
      
      Update the coredump_filter documentation in
      Documentation/filesystems/proc.txt so that it addresses the new DAX
      coredump flags.  Also update the documented default value of
      coredump_filter to be consistent with the core(5) man page.  The
      documentation being updated talks about bit 4, Dump ELF headers, which
      is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the
      kernel config.  This kernel config option defaults to "y" if both ELF
      binaries and coredump are enabled.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5037835c
  27. 24 6月, 2015 1 次提交
  28. 29 5月, 2015 1 次提交
  29. 15 4月, 2015 3 次提交
    • K
      mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE · 204db6ed
      Kees Cook 提交于
      The arch_randomize_brk() function is used on several architectures,
      even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
      tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
      the architectures that support it, while still handling CONFIG_COMPAT_BRK.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Russell King <linux@arm.linux.org.uk>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: "David A. Long" <dave.long@linaro.org>
      Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Arun Chandran <achandran@mvista.com>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Min-Hua Chen <orca.chen@gmail.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Alex Smith <alex@alex-smith.me.uk>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Vineeth Vijayan <vvijayan@mvista.com>
      Cc: Jeff Bailey <jeffbailey@google.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Behan Webster <behanw@converseincode.com>
      Cc: Ismael Ripoll <iripoll@upv.es>
      Cc: Jan-Simon Mller <dl9pf@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      204db6ed
    • K
      mm: split ET_DYN ASLR from mmap ASLR · d1fd836d
      Kees Cook 提交于
      This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
      powerpc, and x86.  The problem is that if there is a leak of ASLR from
      the executable (ET_DYN), it means a leak of shared library offset as
      well (mmap), and vice versa.  Further details and a PoC of this attack
      is available here:
      
        http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html
      
      With this patch, a PIE linked executable (ET_DYN) has its own ASLR
      region:
      
        $ ./show_mmaps_pie
        54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
        54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
        54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie
        7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75beb1f000-7f75beb23000 r--p  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75beb23000-7f75beb25000 rw-p  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75beb25000-7f75beb2a000 rw-p  ...
        7f75beb2a000-7f75beb4d000 r-xp  ...  /lib64/ld-linux-x86-64.so.2
        7f75bed45000-7f75bed46000 rw-p  ...
        7f75bed46000-7f75bed47000 r-xp  ...
        7f75bed47000-7f75bed4c000 rw-p  ...
        7f75bed4c000-7f75bed4d000 r--p  ...  /lib64/ld-linux-x86-64.so.2
        7f75bed4d000-7f75bed4e000 rw-p  ...  /lib64/ld-linux-x86-64.so.2
        7f75bed4e000-7f75bed4f000 rw-p  ...
        7fffb3741000-7fffb3762000 rw-p  ...  [stack]
        7fffb377b000-7fffb377d000 r--p  ...  [vvar]
        7fffb377d000-7fffb377f000 r-xp  ...  [vdso]
      
      The change is to add a call the newly created arch_mmap_rnd() into the
      ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
      as was already done on s390.  Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE,
      which is no longer needed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Cc: Russell King <linux@arm.linux.org.uk>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: "David A. Long" <dave.long@linaro.org>
      Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Arun Chandran <achandran@mvista.com>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Min-Hua Chen <orca.chen@gmail.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Alex Smith <alex@alex-smith.me.uk>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Vineeth Vijayan <vvijayan@mvista.com>
      Cc: Jeff Bailey <jeffbailey@google.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Behan Webster <behanw@converseincode.com>
      Cc: Ismael Ripoll <iripoll@upv.es>
      Cc: Jan-Simon Mller <dl9pf@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1fd836d
    • M
      fs/binfmt_elf.c: fix bug in loading of PIE binaries · a87938b2
      Michael Davidson 提交于
      With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
      address allocation strategy, load_elf_binary() will attempt to map a PIE
      binary into an address range immediately below mm->mmap_base.
      
      Unfortunately, load_elf_ binary() does not take account of the need to
      allocate sufficient space for the entire binary which means that, while
      the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
      PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
      that is supposed to be the "gap" between the stack and the binary.
      
      Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
      means that binaries with large data segments > 128MB can end up mapping
      part of their data segment over their stack resulting in corruption of the
      stack (and the data segment once the binary starts to run).
      
      Any PIE binary with a data segment > 128MB is vulnerable to this although
      address randomization means that the actual gap between the stack and the
      end of the binary is normally greater than 128MB.  The larger the data
      segment of the binary the higher the probability of failure.
      
      Fix this by calculating the total size of the binary in the same way as
      load_elf_interp().
      Signed-off-by: NMichael Davidson <md@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a87938b2
  30. 19 2月, 2015 1 次提交
    • H
      x86, mm/ASLR: Fix stack randomization on 64-bit systems · 4e7c22d4
      Hector Marco-Gisbert 提交于
      The issue is that the stack for processes is not properly randomized on
      64 bit architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file
      "fs/binfmt_elf.c":
      
        static unsigned long randomize_stack_top(unsigned long stack_top)
        {
                 unsigned int random_variable = 0;
      
                 if ((current->flags & PF_RANDOMIZE) &&
                         !(current->personality & ADDR_NO_RANDOMIZE)) {
                         random_variable = get_random_int() & STACK_RND_MASK;
                         random_variable <<= PAGE_SHIFT;
                 }
                 return PAGE_ALIGN(stack_top) + random_variable;
                 return PAGE_ALIGN(stack_top) - random_variable;
        }
      
      Note that, it declares the "random_variable" variable as "unsigned int".
      Since the result of the shifting operation between STACK_RND_MASK (which
      is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      	  random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold
      the (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to
      2^30 (One fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved
      in the operations in the functions randomize_stack_top() and
      stack_maxrandom_size().
      
      The successful fix can be tested with:
      
        $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
        7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
        7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
        7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
        7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
        ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff,
      rather than always being 7fff.
      Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: NIsmael Ripoll <iripoll@upv.es>
      [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: CVE-2015-1593
      Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: NBorislav Petkov <bp@suse.de>
      4e7c22d4
  31. 11 12月, 2014 1 次提交
    • J
      fs/binfmt_elf.c: fix internal inconsistency relating to vma dump size · 52f5592e
      Jungseung Lee 提交于
      vma_dump_size() has been used several times on actual dumper and it is
      supposed to return the same value for the same vma.  But vma_dump_size()
      could return different values for same vma.
      
      The known problem case is concurrent shared memory removal.  If a vma is
      used for a shared memory and that shared memory is removed between
      writing program header and dumping vma memory, this will result in a
      dump file which is internally consistent.
      
      To fix the problem, we set baseline to get dump size and store the size
      into vma_filesz and always use the same vma dump size which is stored in
      vma_filsz.  The consistnecy with reality is not actually guranteed, but
      it's tolerable since that is fully consistent with base line.
      Signed-off-by: NJungseung Lee <js07.lee@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52f5592e