1. 02 8月, 2017 1 次提交
  2. 01 7月, 2017 1 次提交
    • K
      randstruct: Mark various structs for randomization · 3859a271
      Kees Cook 提交于
      This marks many critical kernel structures for randomization. These are
      structures that have been targeted in the past in security exploits, or
      contain functions pointers, pointers to function pointer tables, lists,
      workqueues, ref-counters, credentials, permissions, or are otherwise
      sensitive. This initial list was extracted from Brad Spengler/PaX Team's
      code in the last public patch of grsecurity/PaX based on my understanding
      of the code. Changes or omissions from the original code are mine and
      don't reflect the original grsecurity/PaX code.
      
      Left out of this list is task_struct, which requires special handling
      and will be covered in a subsequent patch.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      3859a271
  3. 03 3月, 2017 1 次提交
  4. 25 7月, 2016 1 次提交
  5. 08 6月, 2016 1 次提交
  6. 13 5月, 2016 2 次提交
  7. 14 12月, 2014 1 次提交
    • D
      syscalls: implement execveat() system call · 51f39a1f
      David Drysdale 提交于
      This patchset adds execveat(2) for x86, and is derived from Meredydd
      Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
      
      The primary aim of adding an execveat syscall is to allow an
      implementation of fexecve(3) that does not rely on the /proc filesystem,
      at least for executables (rather than scripts).  The current glibc version
      of fexecve(3) is implemented via /proc, which causes problems in sandboxed
      or otherwise restricted environments.
      
      Given the desire for a /proc-free fexecve() implementation, HPA suggested
      (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
      an appropriate generalization.
      
      Also, having a new syscall means that it can take a flags argument without
      back-compatibility concerns.  The current implementation just defines the
      AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
      added in future -- for example, flags for new namespaces (as suggested at
      https://lkml.org/lkml/2006/7/11/474).
      
      Related history:
       - https://lkml.org/lkml/2006/12/27/123 is an example of someone
         realizing that fexecve() is likely to fail in a chroot environment.
       - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
         documenting the /proc requirement of fexecve(3) in its manpage, to
         "prevent other people from wasting their time".
       - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
         problem where a process that did setuid() could not fexecve()
         because it no longer had access to /proc/self/fd; this has since
         been fixed.
      
      This patch (of 4):
      
      Add a new execveat(2) system call.  execveat() is to execve() as openat()
      is to open(): it takes a file descriptor that refers to a directory, and
      resolves the filename relative to that.
      
      In addition, if the filename is empty and AT_EMPTY_PATH is specified,
      execveat() executes the file to which the file descriptor refers.  This
      replicates the functionality of fexecve(), which is a system call in other
      UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
      so relies on /proc being mounted).
      
      The filename fed to the executed program as argv[0] (or the name of the
      script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
      (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
      reflecting how the executable was found.  This does however mean that
      execution of a script in a /proc-less environment won't work; also, script
      execution via an O_CLOEXEC file descriptor fails (as the file will not be
      accessible after exec).
      
      Based on patches by Meredydd Luff.
      Signed-off-by: NDavid Drysdale <drysdale@google.com>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Rich Felker <dalias@aerifal.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51f39a1f
  8. 08 4月, 2014 1 次提交
  9. 06 2月, 2014 1 次提交
    • L
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds 提交于
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
  10. 13 11月, 2013 1 次提交
    • K
      exec/ptrace: fix get_dumpable() incorrect tests · d049f74f
      Kees Cook 提交于
      The get_dumpable() return value is not boolean.  Most users of the
      function actually want to be testing for non-SUID_DUMP_USER(1) rather than
      SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
      protected state.  Almost all places did this correctly, excepting the two
      places fixed in this patch.
      
      Wrong logic:
          if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
              or
          if (dumpable == 0) { /* be protective */ }
              or
          if (!dumpable) { /* be protective */ }
      
      Correct logic:
          if (dumpable != SUID_DUMP_USER) { /* be protective */ }
              or
          if (dumpable != 1) { /* be protective */ }
      
      Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
      user was able to ptrace attach to processes that had dropped privileges to
      that user.  (This may have been partially mitigated if Yama was enabled.)
      
      The macros have been moved into the file that declares get/set_dumpable(),
      which means things like the ia64 code can see them too.
      
      CVE-2013-2929
      Reported-by: NVasily Kulikov <segoon@openwall.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d049f74f
  11. 09 11月, 2013 2 次提交
  12. 12 9月, 2013 1 次提交
  13. 30 4月, 2013 1 次提交
  14. 26 2月, 2013 1 次提交
  15. 21 12月, 2012 1 次提交
    • K
      exec: do not leave bprm->interp on stack · b66c5984
      Kees Cook 提交于
      If a series of scripts are executed, each triggering module loading via
      unprintable bytes in the script header, kernel stack contents can leak
      into the command line.
      
      Normally execution of binfmt_script and binfmt_misc happens recursively.
      However, when modules are enabled, and unprintable bytes exist in the
      bprm->buf, execution will restart after attempting to load matching
      binfmt modules.  Unfortunately, the logic in binfmt_script and
      binfmt_misc does not expect to get restarted.  They leave bprm->interp
      pointing to their local stack.  This means on restart bprm->interp is
      left pointing into unused stack memory which can then be copied into the
      userspace argv areas.
      
      After additional study, it seems that both recursion and restart remains
      the desirable way to handle exec with scripts, misc, and modules.  As
      such, we need to protect the changes to interp.
      
      This changes the logic to require allocation for any changes to the
      bprm->interp.  To avoid adding a new kmalloc to every exec, the default
      value is left as-is.  Only when passing through binfmt_script or
      binfmt_misc does an allocation take place.
      
      For a proof of concept, see DoTest.sh from:
      
         http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: halfdog <me@halfdog.net>
      Cc: P J P <ppandit@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b66c5984
  16. 20 12月, 2012 1 次提交
  17. 18 12月, 2012 1 次提交
    • K
      exec: use -ELOOP for max recursion depth · d7402698
      Kees Cook 提交于
      To avoid an explosion of request_module calls on a chain of abusive
      scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
      as maximum recursion depth is hit, the error will fail all the way back
      up the chain, aborting immediately.
      
      This also has the side-effect of stopping the user's shell from attempting
      to reexecute the top-level file as a shell script. As seen in the
      dash source:
      
              if (cmd != path_bshell && errno == ENOEXEC) {
                      *argv-- = cmd;
                      *argv = cmd = path_bshell;
                      goto repeat;
              }
      
      The above logic was designed for running scripts automatically that lacked
      the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
      things continue to behave as the shell expects.
      
      Additionally, when tracking recursion, the binfmt handlers should not be
      involved. The recursion being tracked is the depth of calls through
      search_binary_handler(), so that function should be exclusively responsible
      for tracking the depth.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: halfdog <me@halfdog.net>
      Cc: P J P <ppandit@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7402698
  18. 29 11月, 2012 2 次提交
  19. 13 10月, 2012 1 次提交
  20. 06 10月, 2012 3 次提交
  21. 01 10月, 2012 1 次提交
    • A
      generic kernel_execve() · 282124d1
      Al Viro 提交于
      based mostly on arm and alpha versions.  Architectures can define
      __ARCH_WANT_KERNEL_EXECVE and use it, provided that
      	* they have working current_pt_regs(), even for kernel threads.
      	* kernel_thread-spawned threads do have space for pt_regs
      in the normal location.  Normally that's as simple as switching to
      generic kernel_thread() and making sure that kernel threads do *not*
      go through return from syscall path; call the payload from equivalent
      of ret_from_fork if we are in a kernel thread (or just have separate
      ret_from_kernel_thread and make copy_thread() use it instead of
      ret_from_fork in kernel thread case).
      	* they have ret_from_kernel_execve(); it is called after
      successful do_execve() done by kernel_execve() and gets normal
      pt_regs location passed to it as argument.  It's essentially
      a longjmp() analog - it should set sp, etc. to the situation
      expected at the return for syscall and go there.  Eventually
      the need for that sucker will disappear, but that'll take some
      surgery on kernel_thread() payloads.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      282124d1
  22. 20 9月, 2012 1 次提交
  23. 21 3月, 2012 1 次提交
  24. 07 2月, 2012 1 次提交
    • H
      exec: fix use-after-free bug in setup_new_exec() · 96e02d15
      Heiko Carstens 提交于
      Setting the task name is done within setup_new_exec() by accessing
      bprm->filename. However this happens after flush_old_exec().
      This may result in a use after free bug, flush_old_exec() may
      "complete" vfork_done, which will wake up the parent which in turn
      may free the passed in filename.
      To fix this add a new tcomm field in struct linux_binprm which
      contains the now early generated task name until it is used.
      
      Fixes this bug on s390:
      
        Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
        Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
        Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
        Call Trace:
        ([<0000000000282e2c>] setup_new_exec+0x38/0x374)
         [<00000000002dd12e>] load_elf_binary+0x402/0x1bf4
         [<0000000000280a42>] search_binary_handler+0x38e/0x5bc
         [<0000000000282b6c>] do_execve_common+0x410/0x514
         [<0000000000282cb6>] do_execve+0x46/0x58
         [<00000000005bce58>] kernel_execve+0x28/0x70
         [<000000000014ba2e>] ____call_usermodehelper+0x102/0x140
         [<00000000005bc8da>] kernel_thread_starter+0x6/0xc
         [<00000000005bc8d4>] kernel_thread_starter+0x0/0xc
        Last Breaking-Event-Address:
         [<00000000002830f0>] setup_new_exec+0x2fc/0x374
      
        Kernel panic - not syncing: Fatal exception: panic_on_oops
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96e02d15
  25. 20 7月, 2011 1 次提交
  26. 09 4月, 2011 1 次提交
  27. 14 1月, 2011 1 次提交
  28. 01 12月, 2010 2 次提交
    • O
      exec: copy-and-paste the fixes into compat_do_execve() paths · 114279be
      Oleg Nesterov 提交于
      Note: this patch targets 2.6.37 and tries to be as simple as possible.
      That is why it adds more copy-and-paste horror into fs/compat.c and
      uglifies fs/exec.c, this will be cleanuped later.
      
      compat_copy_strings() plays with bprm->vma/mm directly and thus has
      two problems: it lacks the RLIMIT_STACK check and argv/envp memory
      is not visible to oom killer.
      
      Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
      to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
      as do_execve() does.
      
      Add the fatal_signal_pending/cond_resched checks into compat_count() and
      compat_copy_strings(), this matches the code in fs/exec.c and certainly
      makes sense.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      114279be
    • O
      exec: make argv/envp memory visible to oom-killer · 3c77f845
      Oleg Nesterov 提交于
      Brad Spengler published a local memory-allocation DoS that
      evades the OOM-killer (though not the virtual memory RLIMIT):
      http://www.grsecurity.net/~spender/64bit_dos.c
      
      execve()->copy_strings() can allocate a lot of memory, but
      this is not visible to oom-killer, nobody can see the nascent
      bprm->mm and take it into account.
      
      With this patch get_arg_page() increments current's MM_ANONPAGES
      counter every time we allocate the new page for argv/envp. When
      do_execve() succeds or fails, we change this counter back.
      
      Technically this is not 100% correct, we can't know if the new
      page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
      I don't think this really matters and everything becomes correct
      once exec changes ->mm or fails.
      Reported-by: NBrad Spengler <spender@grsecurity.net>
      Reviewed-and-discussed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c77f845
  29. 18 8月, 2010 1 次提交
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  30. 07 3月, 2010 1 次提交
  31. 30 1月, 2010 1 次提交
    • L
      Split 'flush_old_exec' into two functions · 221af7f8
      Linus Torvalds 提交于
      'flush_old_exec()' is the point of no return when doing an execve(), and
      it is pretty badly misnamed.  It doesn't just flush the old executable
      environment, it also starts up the new one.
      
      Which is very inconvenient for things like setting up the new
      personality, because we want the new personality to affect the starting
      of the new environment, but at the same time we do _not_ want the new
      personality to take effect if flushing the old one fails.
      
      As a result, the x86-64 '32-bit' personality is actually done using this
      insane "I'm going to change the ABI, but I haven't done it yet" bit
      (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
      personality, but just the "pending" bit, so that "flush_thread()" can do
      the actual personality magic.
      
      This patch in no way changes any of that insanity, but it does split the
      'flush_old_exec()' function up into a preparatory part that can fail
      (still called flush_old_exec()), and a new part that will actually set
      up the new exec environment (setup_new_exec()).  All callers are changed
      to trivially comply with the new world order.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      221af7f8
  32. 18 12月, 2009 1 次提交
  33. 24 9月, 2009 1 次提交
  34. 06 9月, 2009 1 次提交
    • O
      exec: do not sleep in TASK_TRACED under ->cred_guard_mutex · a2a8474c
      Oleg Nesterov 提交于
      Tom Horsley reports that his debugger hangs when it tries to read
      /proc/pid_of_tracee/maps, this happens since
      
      	"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
      	04b836cbf19e885f8366bccb2e4b0474346c02d
      
      commit in 2.6.31.
      
      But the root of the problem lies in the fact that do_execve() path calls
      tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.
      
      The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
      remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
      another task doing PTRACE_ATTACH should not hang until it is killed or the
      tracee resumes.
      
      With this patch do_execve() does not use ->cred_guard_mutex directly and
      we do not hold it throughout, instead:
      
      	- introduce prepare_bprm_creds() helper, it locks the mutex
      	  and calls prepare_exec_creds() to initialize bprm->cred.
      
      	- install_exec_creds() drops the mutex after commit_creds(),
      	  and thus before tracehook_report_exec()->ptrace_stop().
      
      	  or, if exec fails,
      
      	  free_bprm() drops this mutex when bprm->cred != NULL which
      	  indicates install_exec_creds() was not called.
      Reported-by: NTom Horsley <tom.horsley@att.net>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2a8474c