1. 13 5月, 2016 2 次提交
  2. 23 3月, 2016 1 次提交
    • J
      fs/coredump: prevent fsuid=0 dumps into user-controlled directories · 378c6520
      Jann Horn 提交于
      This commit fixes the following security hole affecting systems where
      all of the following conditions are fulfilled:
      
       - The fs.suid_dumpable sysctl is set to 2.
       - The kernel.core_pattern sysctl's value starts with "/". (Systems
         where kernel.core_pattern starts with "|/" are not affected.)
       - Unprivileged user namespace creation is permitted. (This is
         true on Linux >=3.8, but some distributions disallow it by
         default using a distro patch.)
      
      Under these conditions, if a program executes under secure exec rules,
      causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
      namespace, changes its root directory and crashes, the coredump will be
      written using fsuid=0 and a path derived from kernel.core_pattern - but
      this path is interpreted relative to the root directory of the process,
      allowing the attacker to control where a coredump will be written with
      root privileges.
      
      To fix the security issue, always interpret core_pattern for dumps that
      are written under SUID_DUMP_ROOT relative to the root directory of init.
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      378c6520
  3. 21 1月, 2016 1 次提交
    • J
      fs/coredump: prevent "" / "." / ".." core path components · ac94b6e3
      Jann Horn 提交于
      Let %h and %e print empty values as "!", "." as "!" and
      ".." as "!.".
      
      This prevents hostnames and comm values that are empty or consist of one
      or two dots from changing the directory level at which the corefile will
      be stored.
      
      Consider the case where someone decides to sort coredumps by hostname
      with a core pattern like "/cores/%h/core.%e.%p.%t" or so.  In this
      case, hostnames "" and "." would cause the coredump to land directly in
      /cores, which is not what the intent behind the core pattern is, and
      ".." would cause the coredump to land in /.
      
      Yeah, there probably aren't many people who do that, but I still don't
      want this edgecase to be kind of broken.
      
      It seems very unlikely that this caused security issues anywhere, so I'm
      not requesting a stable backport.
      
      [akpm@linux-foundation.org: tweak code comment]
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac94b6e3
  4. 07 12月, 2015 1 次提交
  5. 07 11月, 2015 2 次提交
  6. 11 9月, 2015 2 次提交
    • J
      fs: Don't dump core if the corefile would become world-readable. · 40f705a7
      Jann Horn 提交于
      On a filesystem like vfat, all files are created with the same owner
      and mode independent of who created the file. When a vfat filesystem
      is mounted with root as owner of all files and read access for everyone,
      root's processes left world-readable coredumps on it (but other
      users' processes only left empty corefiles when given write access
      because of the uid mismatch).
      
      Given that the old behavior was inconsistent and insecure, I don't see
      a problem with changing it. Now, all processes refuse to dump core unless
      the resulting corefile will only be readable by their owner.
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40f705a7
    • J
      fs: if a coredump already exists, unlink and recreate with O_EXCL · fbb18169
      Jann Horn 提交于
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: NJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbb18169
  7. 26 6月, 2015 2 次提交
  8. 24 6月, 2015 1 次提交
  9. 12 4月, 2015 1 次提交
  10. 07 3月, 2015 1 次提交
  11. 20 2月, 2015 1 次提交
  12. 14 10月, 2014 1 次提交
    • O
      coredump: add %i/%I in core_pattern to report the tid of the crashed thread · b03023ec
      Oleg Nesterov 提交于
      format_corename() can only pass the leader's pid to the core handler,
      but there is no simple way to figure out which thread originated the
      coredump.
      
      As Jan explains, this also means that there is no simple way to create
      the backtrace of the crashed process:
      
      As programs are mostly compiled with implicit gcc -fomit-frame-pointer
      one needs program's .eh_frame section (equivalently PT_GNU_EH_FRAME
      segment) or .debug_frame section.  .debug_frame usually is present only
      in separate debug info files usually not even installed on the system.
      While .eh_frame is a part of the executable/library (and it is even
      always mapped for C++ exceptions unwinding) it no longer has to be
      present anywhere on the disk as the program could be upgraded in the
      meantime and the running instance has its executable file already
      unlinked from disk.
      
      One possibility is to echo 0x3f >/proc/*/coredump_filter and dump all
      the file-backed memory including the executable's .eh_frame section.
      But that can create huge core files, for example even due to mmapped
      data files.
      
      Other possibility would be to read .eh_frame from /proc/PID/mem at the
      core_pattern handler time of the core dump.  For the backtrace one needs
      to read the register state first which can be done from core_pattern
      handler:
      
          ptrace(PTRACE_SEIZE, tid, 0, PTRACE_O_TRACEEXIT)
          close(0);    // close pipe fd to resume the sleeping dumper
          waitpid();   // should report EXIT
          PTRACE_GETREGS or other requests
      
      The remaining problem is how to get the 'tid' value of the crashed
      thread.  It could be read from the first NT_PRSTATUS note of the core
      file but that makes the core_pattern handler complicated.
      
      Unfortunately %t is already used so this patch uses %i/%I.
      
      Automatic Bug Reporting Tool (https://github.com/abrt/abrt/wiki/overview)
      is experimenting with this.  It is using the elfutils
      (https://fedorahosted.org/elfutils/) unwinder for generating the
      backtraces.  Apart from not needing matching executables as mentioned
      above, another advantage is that we can get the backtrace without saving
      the core (which might be quite large) to disk.
      
      [mmilata@redhat.com: final paragraph of changelog]
      Signed-off-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Mark Wielaard <mjw@redhat.com>
      Cc: Martin Milata <mmilata@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b03023ec
  13. 24 7月, 2014 1 次提交
  14. 20 4月, 2014 1 次提交
    • E
      coredump: fix va_list corruption · 404ca80e
      Eric Dumazet 提交于
      A va_list needs to be copied in case it needs to be used twice.
      
      Thanks to Hugh for debugging this issue, leading to various panics.
      
      Tested:
      
        lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
      
      'produce_core' is simply : main() { *(int *)0 = 1;}
      
        lpq84:~# ./produce_core
        Segmentation fault (core dumped)
        lpq84:~# dmesg | tail -1
        [  614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed
      
      Notice the last argument was replaced by a NULL (we were lucky enough to
      not crash, but do not try this on your production machine !)
      
      After fix :
      
        lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
        lpq83:~# ./produce_core
        Segmentation fault
        lpq83:~# dmesg | tail -1
        [  740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed
      
      Fixes: 5fe9d8ca ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NHugh Dickins <hughd@google.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      404ca80e
  15. 24 1月, 2014 1 次提交
  16. 16 11月, 2013 2 次提交
  17. 09 11月, 2013 5 次提交
  18. 25 10月, 2013 1 次提交
  19. 12 9月, 2013 1 次提交
  20. 04 7月, 2013 6 次提交
  21. 05 5月, 2013 1 次提交
  22. 01 5月, 2013 5 次提交
    • O
      coredump: change wait_for_dump_helpers() to use wait_event_interruptible() · dc7ee2aa
      Oleg Nesterov 提交于
      wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
      wait_event-like loop.  This is not needed and in fact this is not
      strictly correct, we can/should do this only once after we change
      pipe->writers.  We could even check if it becomes zero.
      
      Change this code to use use wait_event_interruptible(), this can also
      help to make this wait freezable.
      
      With this patch we check pipe->readers without pipe_lock(), this is
      fine.  Once we see pipe->readers == 1 we know that the handler
      decremented the counter, this is all we need.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Neil Horman <nhorman@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc7ee2aa
    • O
      coredump: factor out the setting of PF_DUMPCORE · 079148b9
      Oleg Nesterov 提交于
      Cleanup.  Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
      zap_threads() called by do_coredump().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Neil Horman <nhorman@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      079148b9
    • O
      coredump: introduce dump_interrupted() · 528f827e
      Oleg Nesterov 提交于
      By discussion with Mandeep.
      
      Change dump_write(), dump_seek() and do_coredump() to check
      signal_pending() and abort if it is true.  dump_seek() does this only
      before f_op->llseek(), otherwise it relies on dump_write().
      
      We need this change to ensure that the coredump won't delay suspend, and
      to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
      lot of time.  In particular this can help oom-killer.
      
      We add the new trivial helper, dump_interrupted() to add the comments and
      to simplify the potential freezer changes.  Perhaps it will have more
      callers.
      
      Ideally it should do try_to_freeze() but then we need the unpleasant
      changes in dump_write() and wait_for_dump_helpers().  It is not trivial to
      change dump_write() to restart if f_op->write() fails because of
      freezing().  We need to handle the short writes, we need to clear
      TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
      it to check PF_DUMPCORE).  And if the buggy f_op->write() sets
      TIF_SIGPENDING we can not distinguish this case from the race with
      freeze_task() + __thaw_task().
      
      So we simply accept the fact that the freezer can truncate a core-dump but
      at least you can reliably suspend.  Hopefully we can tolerate this
      unlikely case and the necessary complications doesn't worth a trouble.
      But if we decide to make the coredumping freezable later we can do this on
      top of this change.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Neil Horman <nhorman@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      528f827e
    • O
      coredump: sanitize the setting of signal->group_exit_code · acdedd99
      Oleg Nesterov 提交于
      Now that the coredumping process can be SIGKILL'ed, the setting of
      ->group_exit_code in do_coredump() can race with complete_signal() and
      SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
      SIGKILL | 0x80.
      
      But the main problem is that it is not clear to me what should we do if
      binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
      comes as a separate change.
      
      This patch adds 0x80 if ->core_dump() succeeds and the process was not
      killed.  But perhaps we can (should?) re-set ->group_exit_code changed by
      SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Neil Horman <nhorman@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acdedd99
    • O
      coredump: ensure that SIGKILL always kills the dumping thread · 6cd8f0ac
      Oleg Nesterov 提交于
      prepare_signal() blesses SIGKILL sent to the dumping process but this
      signal can be "lost" anyway.  The problems is, complete_signal() sees
      SIGNAL_GROUP_EXIT and skips the "kill them all" logic.  And even if the
      dumping process is single-threaded (so the target is always "correct"),
      the group-wide SIGKILL is not recorded in task->pending and thus
      __fatal_signal_pending() won't be true.  A multi-threaded case has even
      more problems.
      
      And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
      right to me.  This coredumping process is not exiting yet, it can do a lot
      of work dumping the core.
      
      With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
      signal->group_exit_task instead.  This makes signal_group_exit() true and
      thus this should equally close the races with exit/exec/stop but allows to
      kill the dumping thread reliably.
      
      Notes:
      	- It is not clear what should we do with ->group_exit_code
      	  if the dumper was killed, see the next change.
      
      	- we need more (hopefully straightforward) changes to ensure
      	  that SIGKILL actually interrupts the coredump. Basically we
      	  need to check __fatal_signal_pending() in dump_write() and
      	  dump_seek().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Neil Horman <nhorman@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cd8f0ac