1. 21 4月, 2018 1 次提交
  2. 12 4月, 2018 3 次提交
    • M
      elf: enforce MAP_FIXED on overlaying elf segments · ad55eac7
      Michal Hocko 提交于
      Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from
      elf_map" applied, some ELF binaries in his environment fail to start
      with
      
       [   23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
       [   23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon
      
      The reason is that the above binary has overlapping elf segments:
      
        LOAD           0x0000000000000000 0x0000000010000000 0x0000000010000000
                       0x0000000000013a8c 0x0000000000013a8c  R E    10000
        LOAD           0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
                       0x00000000000002c0 0x00000000000005e8  RW     10000
        LOAD           0x0000000000020328 0x0000000010030328 0x0000000010030328
                       0x0000000000000384 0x00000000000094a0  RW     10000
      
      That binary has two RW LOAD segments, the first crosses a page border
      into the second
      
        0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)
      
      Handle this situation by enforcing MAP_FIXED when we establish a
      temporary brk VMA to handle overlapping segments.  All other mappings
      will still use MAP_FIXED_NOREPLACE.
      
      Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad55eac7
    • M
      fs, elf: drop MAP_FIXED usage from elf_map · 4ed28639
      Michal Hocko 提交于
      Both load_elf_interp and load_elf_binary rely on elf_map to map segments
      on a controlled address and they use MAP_FIXED to enforce that.  This is
      however dangerous thing prone to silent data corruption which can be
      even exploitable.
      
      Let's take CVE-2017-1000253 as an example.  At the time (before commit
      eab09532: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
      ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
      the stack top on 32b (legacy) memory layout (only 1GB away).  Therefore
      we could end up mapping over the existing stack with some luck.
      
      The issue has been fixed since then (a87938b2: "fs/binfmt_elf.c: fix
      bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much
      further from the stack (eab09532 and later by c715b72c: "mm:
      revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive
      stack consumption early during execve fully stopped by da029c11
      ("exec: Limit arg stack to at most 75% of _STK_LIM").  So we should be
      safe and any attack should be impractical.  On the other hand this is
      just too subtle assumption so it can break quite easily and hard to
      spot.
      
      I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
      fundamentally dangerous.  Moreover it shouldn't be even needed.  We are
      at the early process stage and so there shouldn't be unrelated mappings
      (except for stack and loader) existing so mmap for a given address should
      succeed even without MAP_FIXED.  Something is terribly wrong if this is
      not the case and we should rather fail than silently corrupt the
      underlying mapping.
      
      Address this issue by changing MAP_FIXED to the newly added
      MAP_FIXED_NOREPLACE.  This will mean that mmap will fail if there is an
      existing mapping clashing with the requested one without clobbering it.
      
      [mhocko@suse.com: fix build]
      [akpm@linux-foundation.org: coding-style fixes]
      [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC]
        Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org
      Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ed28639
    • K
      exec: introduce finalize_exec() before start_thread() · b8383831
      Kees Cook 提交于
      Provide a final callback into fs/exec.c before start_thread() takes
      over, to handle any last-minute changes, like the coming restoration of
      the stack limit.
      
      Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Greg KH <greg@kroah.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8383831
  3. 07 2月, 2018 1 次提交
  4. 03 11月, 2017 1 次提交
    • D
      regset: Add support for dynamically sized regsets · 27e64b4b
      Dave Martin 提交于
      Currently the regset API doesn't allow for the possibility that
      regsets (or at least, the amount of meaningful data in a regset)
      may change in size.
      
      In particular, this results in useless padding being added to
      coredumps if a regset's current size is smaller than its
      theoretical maximum size.
      
      This patch adds a get_size() function to struct user_regset.
      Individual regset implementations can implement this function to
      return the current size of the regset data.  A regset_size()
      function is added to provide callers with an abstract interface for
      determining the size of a regset without needing to know whether
      the regset is dynamically sized or not.
      
      The only affected user of this interface is the ELF coredump code:
      This patch ports ELF coredump to dump regsets with their actual
      size in the coredump.  This has no effect except for new regsets
      that are dynamically sized and provide a get_size() implementation.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      27e64b4b
  5. 11 9月, 2017 1 次提交
  6. 05 9月, 2017 1 次提交
  7. 17 8月, 2017 1 次提交
  8. 02 8月, 2017 1 次提交
    • K
      binfmt: Introduce secureexec flag · c425e189
      Kees Cook 提交于
      The bprm_secureexec hook can be moved earlier. Right now, it is called
      during create_elf_tables(), via load_binary(), via search_binary_handler(),
      via exec_binprm(). Nearly all (see exception below) state used by
      bprm_secureexec is created during the bprm_set_creds hook, called from
      prepare_binprm().
      
      For all LSMs (except commoncaps described next), only the first execution
      of bprm_set_creds takes any effect (they all check bprm->called_set_creds
      which prepare_binprm() sets after the first call to the bprm_set_creds
      hook).  However, all these LSMs also only do anything with bprm_secureexec
      when they detected a secure state during their first run of bprm_set_creds.
      Therefore, it is functionally identical to move the detection into
      bprm_set_creds, since the results from secureexec here only need to be
      based on the first call to the LSM's bprm_set_creds hook.
      
      The single exception is that the commoncaps secureexec hook also examines
      euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
      via prepare_binprm(), which can be called multiple times (e.g.
      binfmt_script, binfmt_misc), and may clear the euid/egid for the final
      load (i.e. the script interpreter). However, while commoncaps specifically
      ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
      prepare_binprm() may get called, it needs to base the secureexec decision
      on the final call to bprm_set_creds. As a result, it will need special
      handling.
      
      To begin this refactoring, this adds the secureexec flag to the bprm
      struct, and calls the secureexec hook during setup_new_exec(). This is
      safe since all the cred work is finished (and past the point of no return).
      This explicit call will be removed in later patches once the hook has been
      removed.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJohn Johansen <john.johansen@canonical.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Reviewed-by: NJames Morris <james.l.morris@oracle.com>
      c425e189
  9. 11 7月, 2017 2 次提交
    • K
      binfmt_elf: safely increment argv pointers · 67c6777a
      Kees Cook 提交于
      When building the argv/envp pointers, the envp is needlessly
      pre-incremented instead of just continuing after the argv pointers are
      finished.  In some (likely impossible) race where the strings could be
      changed from userspace between copy_strings() and here, it might be
      possible to confuse the envp position.  Instead, just use sp like
      everything else.
      
      Link: http://lkml.kernel.org/r/20170622173838.GA43308@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67c6777a
    • K
      binfmt_elf: use ELF_ET_DYN_BASE only for PIE · eab09532
      Kees Cook 提交于
      The ELF_ET_DYN_BASE position was originally intended to keep loaders
      away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
      /bin/cat" might cause the subsequent load of /bin/cat into where the
      loader had been loaded.)
      
      With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
      ELF_ET_DYN_BASE continued to be used since the kernel was only looking
      at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
      top 1/3rd of the TASK_SIZE, a substantial portion of the address space
      is unused.
      
      For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
      loaded above the mmap region.  This means they can be made to collide
      (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
      pathological stack regions.
      
      Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
      region in all cases, and will now additionally avoid programs falling
      back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
      if it would have collided with the stack, now it will fail to load
      instead of falling back to the mmap region).
      
      To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
      are loaded into the mmap region, leaving space available for either an
      ET_EXEC binary with a fixed location or PIE being loaded into mmap by
      the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
      which means architectures can now safely lower their values without risk
      of loaders colliding with their subsequently loaded programs.
      
      For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
      the entire 32-bit address space for 32-bit pointers.
      
      Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
      suggestions on how to implement this solution.
      
      Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR")
      Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eab09532
  10. 02 3月, 2017 4 次提交
  11. 23 2月, 2017 1 次提交
    • D
      powerpc: do not make the entire heap executable · 16e72e9b
      Denys Vlasenko 提交于
      On 32-bit powerpc the ELF PLT sections of binaries (built with
      --bss-plt, or with a toolchain which defaults to it) look like this:
      
        [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
        [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
        [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4
      
      Which results in an ELF load header:
      
        Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
        LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000
      
      This is all correct, the load region containing the PLT is marked as
      executable.  Note that the PLT starts at 0002b00c but the file mapping
      ends at 0002aff8, so the PLT falls in the 0 fill section described by
      the load header, and after a page boundary.
      
      Unfortunately the generic ELF loader ignores the X bit in the load
      headers when it creates the 0 filled non-file backed mappings.  It
      assumes all of these mappings are RW BSS sections, which is not the case
      for PPC.
      
      gcc/ld has an option (--secure-plt) to not do this, this is said to
      incur a small performance penalty.
      
      Currently, to support 32-bit binaries with PLT in BSS kernel maps
      *entire brk area* with executable rights for all binaries, even
      --secure-plt ones.
      
      Stop doing that.
      
      Teach the ELF loader to check the X bit in the relevant load header and
      create 0 filled anonymous mappings that are executable if the load
      header requests that.
      
      Test program showing the difference in /proc/$PID/maps:
      
      int main() {
      	char buf[16*1024];
      	char *p = malloc(123); /* make "[heap]" mapping appear */
      	int fd = open("/proc/self/maps", O_RDONLY);
      	int len = read(fd, buf, sizeof(buf));
      	write(1, buf, len);
      	printf("%p\n", p);
      	return 0;
      }
      
      Compiled using: gcc -mbss-plt -m32 -Os test.c -otest
      
      Unpatched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
      f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
      0x10690008
      
      Patched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                        ^^^^ this has changed
      f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
      0x10180008
      
      The patch was originally posted in 2012 by Jason Gunthorpe
      and apparently ignored:
      
      https://lkml.org/lkml/2012/9/30/138
      
      Lightly run-tested.
      
      Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.comSigned-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16e72e9b
  12. 01 2月, 2017 3 次提交
    • F
      fs/binfmt: Convert obsolete cputime type to nsecs · cd19c364
      Frederic Weisbecker 提交于
      Use the new nsec based cputime accessors as part of the whole cputime
      conversion from cputime_t to nsecs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-12-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd19c364
    • F
      sched/cputime: Convert task/group cputime to nsecs · 5613fda9
      Frederic Weisbecker 提交于
      Now that most cputime readers use the transition API which return the
      task cputime in old style cputime_t, we can safely store the cputime in
      nsecs. This will eventually make cputime statistics less opaque and more
      granular. Back and forth convertions between cputime_t and nsecs in order
      to deal with cputime_t random granularity won't be needed anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5613fda9
    • F
      sched/cputime: Introduce special task_cputime_t() API to return old-typed cputime · a1cecf2b
      Frederic Weisbecker 提交于
      This API returns a task's cputime in cputime_t in order to ease the
      conversion of cputime internals to use nsecs units instead. Blindly
      converting all cputime readers to use this API now will later let us
      convert more smoothly and step by step all these places to use the
      new nsec based cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1cecf2b
  13. 15 1月, 2017 1 次提交
    • D
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp 提交于
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
  14. 25 12月, 2016 1 次提交
  15. 13 12月, 2016 1 次提交
  16. 15 9月, 2016 1 次提交
  17. 01 9月, 2016 1 次提交
  18. 03 8月, 2016 1 次提交
    • K
      binfmt_elf: fix calculations for bss padding · 0036d1f7
      Kees Cook 提交于
      A double-bug exists in the bss calculation code, where an overflow can
      happen in the "last_bss - elf_bss" calculation, but vm_brk internally
      aligns the argument, underflowing it, wrapping back around safe.  We
      shouldn't depend on these bugs staying in sync, so this cleans up the
      bss padding handling to avoid the overflow.
      
      This moves the bss padzero() before the last_bss > elf_bss case, since
      the zero-filling of the ELF_PAGE should have nothing to do with the
      relationship of last_bss and elf_bss: any trailing portion should be
      zeroed, and a zero size is already handled by padzero().
      
      Then it handles the math on elf_bss vs last_bss correctly.  These need
      to both be ELF_PAGE aligned to get the comparison correct, since that's
      the expected granularity of the mappings.  Since elf_bss already had
      alignment-based padding happen in padzero(), the "start" of the new
      vm_brk() should be moved forward as done in the original code.  However,
      since the "end" of the vm_brk() area will already become PAGE_ALIGNed in
      vm_brk() then last_bss should get aligned here to avoid hiding it as a
      side-effect.
      
      Additionally makes a cosmetic change to the initial last_bss calculation
      so it's easier to read in comparison to the load_addr calculation above
      it (i.e.  the only difference is p_filesz vs p_memsz).
      
      Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Cc: Ismael Ripoll Ripoll <iripoll@upv.es>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0036d1f7
  19. 08 6月, 2016 1 次提交
  20. 28 5月, 2016 1 次提交
    • L
      mm: remove more IS_ERR_VALUE abuses · 5d22fc25
      Linus Torvalds 提交于
      The do_brk() and vm_brk() return value was "unsigned long" and returned
      the starting address on success, and an error value on failure.  The
      reasons are entirely historical, and go back to it basically behaving
      like the mmap() interface does.
      
      However, nobody actually wanted that interface, and it causes totally
      pointless IS_ERR_VALUE() confusion.
      
      What every single caller actually wants is just the simpler integer
      return of zero for success and negative error number on failure.
      
      So just convert to that much clearer and more common calling convention,
      and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d22fc25
  21. 24 5月, 2016 1 次提交
  22. 13 5月, 2016 1 次提交
  23. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  24. 28 2月, 2016 1 次提交
    • D
      mm: ASLR: use get_random_long() · 5ef11c35
      Daniel Cashman 提交于
      Replace calls to get_random_int() followed by a cast to (unsigned long)
      with calls to get_random_long().  Also address shifting bug which, in
      case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
      Signed-off-by: NDaniel Cashman <dcashman@android.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef11c35
  25. 20 1月, 2016 1 次提交
  26. 11 11月, 2015 2 次提交
    • M
      binfmt_elf: Correct `arch_check_elf's description · 54d15714
      Maciej W. Rozycki 提交于
      Correct `arch_check_elf's description, mistakenly copied and pasted from
      `arch_elf_pt_proc'.
      Signed-off-by: NMaciej W. Rozycki <macro@imgtec.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      54d15714
    • M
      binfmt_elf: Don't clobber passed executable's file header · b582ef5c
      Maciej W. Rozycki 提交于
      Do not clobber the buffer space passed from `search_binary_handler' and
      originally preloaded by `prepare_binprm' with the executable's file
      header by overwriting it with its interpreter's file header.  Instead
      keep the buffer space intact and directly use the data structure locally
      allocated for the interpreter's file header, fixing a bug introduced in
      2.1.14 with loadable module support (linux-mips.org commit beb11695
      [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
      Adjust the amount of data read from the interpreter's file accordingly.
      
      This was not an issue before loadable module support, because back then
      `load_elf_binary' was executed only once for a given ELF executable,
      whether the function succeeded or failed.
      
      With loadable module support supported and enabled, upon a failure of
      `load_elf_binary' -- which may for example be caused by architecture
      code rejecting an executable due to a missing hardware feature requested
      in the file header -- a module load is attempted and then the function
      reexecuted by `search_binary_handler'.  With the executable's file
      header replaced with its interpreter's file header the executable can
      then be erroneously accepted in this subsequent attempt.
      
      Cc: stable@vger.kernel.org # all the way back
      Signed-off-by: NMaciej W. Rozycki <macro@imgtec.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b582ef5c
  27. 10 11月, 2015 1 次提交
    • R
      coredump: add DAX filtering for ELF coredumps · 5037835c
      Ross Zwisler 提交于
      Add two new flags to the existing coredump mechanism for ELF files to
      allow us to explicitly filter DAX mappings.  This is desirable because
      DAX mappings, like hugetlb mappings, have the potential to be very
      large.
      
      Update the coredump_filter documentation in
      Documentation/filesystems/proc.txt so that it addresses the new DAX
      coredump flags.  Also update the documented default value of
      coredump_filter to be consistent with the core(5) man page.  The
      documentation being updated talks about bit 4, Dump ELF headers, which
      is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the
      kernel config.  This kernel config option defaults to "y" if both ELF
      binaries and coredump are enabled.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5037835c
  28. 24 6月, 2015 1 次提交
  29. 29 5月, 2015 1 次提交
  30. 15 4月, 2015 2 次提交
    • K
      mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE · 204db6ed
      Kees Cook 提交于
      The arch_randomize_brk() function is used on several architectures,
      even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
      tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
      the architectures that support it, while still handling CONFIG_COMPAT_BRK.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Russell King <linux@arm.linux.org.uk>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: "David A. Long" <dave.long@linaro.org>
      Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Arun Chandran <achandran@mvista.com>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Min-Hua Chen <orca.chen@gmail.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Alex Smith <alex@alex-smith.me.uk>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Vineeth Vijayan <vvijayan@mvista.com>
      Cc: Jeff Bailey <jeffbailey@google.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Behan Webster <behanw@converseincode.com>
      Cc: Ismael Ripoll <iripoll@upv.es>
      Cc: Jan-Simon Mller <dl9pf@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      204db6ed
    • K
      mm: split ET_DYN ASLR from mmap ASLR · d1fd836d
      Kees Cook 提交于
      This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
      powerpc, and x86.  The problem is that if there is a leak of ASLR from
      the executable (ET_DYN), it means a leak of shared library offset as
      well (mmap), and vice versa.  Further details and a PoC of this attack
      is available here:
      
        http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html
      
      With this patch, a PIE linked executable (ET_DYN) has its own ASLR
      region:
      
        $ ./show_mmaps_pie
        54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
        54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
        54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie
        7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75beb1f000-7f75beb23000 r--p  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75beb23000-7f75beb25000 rw-p  ...  /lib/x86_64-linux-gnu/libc.so.6
        7f75beb25000-7f75beb2a000 rw-p  ...
        7f75beb2a000-7f75beb4d000 r-xp  ...  /lib64/ld-linux-x86-64.so.2
        7f75bed45000-7f75bed46000 rw-p  ...
        7f75bed46000-7f75bed47000 r-xp  ...
        7f75bed47000-7f75bed4c000 rw-p  ...
        7f75bed4c000-7f75bed4d000 r--p  ...  /lib64/ld-linux-x86-64.so.2
        7f75bed4d000-7f75bed4e000 rw-p  ...  /lib64/ld-linux-x86-64.so.2
        7f75bed4e000-7f75bed4f000 rw-p  ...
        7fffb3741000-7fffb3762000 rw-p  ...  [stack]
        7fffb377b000-7fffb377d000 r--p  ...  [vvar]
        7fffb377d000-7fffb377f000 r-xp  ...  [vdso]
      
      The change is to add a call the newly created arch_mmap_rnd() into the
      ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
      as was already done on s390.  Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE,
      which is no longer needed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Cc: Russell King <linux@arm.linux.org.uk>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: "David A. Long" <dave.long@linaro.org>
      Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Arun Chandran <achandran@mvista.com>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Min-Hua Chen <orca.chen@gmail.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Alex Smith <alex@alex-smith.me.uk>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Vineeth Vijayan <vvijayan@mvista.com>
      Cc: Jeff Bailey <jeffbailey@google.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Behan Webster <behanw@converseincode.com>
      Cc: Ismael Ripoll <iripoll@upv.es>
      Cc: Jan-Simon Mller <dl9pf@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1fd836d