1. 25 7月, 2017 1 次提交
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
  2. 23 5月, 2017 1 次提交
    • E
      ptrace: Properly initialize ptracer_cred on fork · c70d9d80
      Eric W. Biederman 提交于
      When I introduced ptracer_cred I failed to consider the weirdness of
      fork where the task_struct copies the old value by default.  This
      winds up leaving ptracer_cred set even when a process forks and
      the child process does not wind up being ptraced.
      
      Because ptracer_cred is not set on non-ptraced processes whose
      parents were ptraced this has broken the ability of the enlightenment
      window manager to start setuid children.
      
      Fix this by properly initializing ptracer_cred in ptrace_init_task
      
      This must be done with a little bit of care to preserve the current value
      of ptracer_cred when ptrace carries through fork.  Re-reading the
      ptracer_cred from the ptracing process at this point is inconsistent
      with how PT_PTRACE_CAP has been maintained all of these years.
      Tested-by: NTakashi Iwai <tiwai@suse.de>
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c70d9d80
  3. 08 4月, 2017 1 次提交
  4. 02 3月, 2017 3 次提交
  5. 23 11月, 2016 3 次提交
    • E
      ptrace: Don't allow accessing an undumpable mm · 84d77d3f
      Eric W. Biederman 提交于
      It is the reasonable expectation that if an executable file is not
      readable there will be no way for a user without special privileges to
      read the file.  This is enforced in ptrace_attach but if ptrace
      is already attached before exec there is no enforcement for read-only
      executables.
      
      As the only way to read such an mm is through access_process_vm
      spin a variant called ptrace_access_vm that will fail if the
      target process is not being ptraced by the current process, or
      the current process did not have sufficient privileges when ptracing
      began to read the target processes mm.
      
      In the ptrace implementations replace access_process_vm by
      ptrace_access_vm.  There remain several ptrace sites that still use
      access_process_vm as they are reading the target executables
      instructions (for kernel consumption) or register stacks.  As such it
      does not appear necessary to add a permission check to those calls.
      
      This bug has always existed in Linux.
      
      Fixes: v1.0
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      84d77d3f
    • E
      ptrace: Capture the ptracer's creds not PT_PTRACE_CAP · 64b875f7
      Eric W. Biederman 提交于
      When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
      overlooked.  This can result in incorrect behavior when an application
      like strace traces an exec of a setuid executable.
      
      Further PT_PTRACE_CAP does not have enough information for making good
      security decisions as it does not report which user namespace the
      capability is in.  This has already allowed one mistake through
      insufficient granulariy.
      
      I found this issue when I was testing another corner case of exec and
      discovered that I could not get strace to set PT_PTRACE_CAP even when
      running strace as root with a full set of caps.
      
      This change fixes the above issue with strace allowing stracing as
      root a setuid executable without disabling setuid.  More fundamentaly
      this change allows what is allowable at all times, by using the correct
      information in it's decision.
      
      Cc: stable@vger.kernel.org
      Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      64b875f7
    • E
      mm: Add a user_ns owner to mm_struct and fix ptrace permission checks · bfedb589
      Eric W. Biederman 提交于
      During exec dumpable is cleared if the file that is being executed is
      not readable by the user executing the file.  A bug in
      ptrace_may_access allows reading the file if the executable happens to
      enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
      unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
      
      This problem is fixed with only necessary userspace breakage by adding
      a user namespace owner to mm_struct, captured at the time of exec, so
      it is clear in which user namespace CAP_SYS_PTRACE must be present in
      to be able to safely give read permission to the executable.
      
      The function ptrace_may_access is modified to verify that the ptracer
      has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
      This ensures that if the task changes it's cred into a subordinate
      user namespace it does not become ptraceable.
      
      The function ptrace_attach is modified to only set PT_PTRACE_CAP when
      CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
      PT_PTRACE_CAP is to be a flag to note that whatever permission changes
      the task might go through the tracer has sufficient permissions for
      it not to be an issue.  task->cred->user_ns is always the same
      as or descendent of mm->user_ns.  Which guarantees that having
      CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
      credentials.
      
      To prevent regressions mm->dumpable and mm->user_ns are not considered
      when a task has no mm.  As simply failing ptrace_may_attach causes
      regressions in privileged applications attempting to read things
      such as /proc/<pid>/stat
      
      Cc: stable@vger.kernel.org
      Acked-by: NKees Cook <keescook@chromium.org>
      Tested-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Fixes: 8409cca7 ("userns: allow ptrace from non-init user namespaces")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      bfedb589
  6. 19 10月, 2016 1 次提交
  7. 12 10月, 2016 1 次提交
  8. 04 8月, 2016 1 次提交
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
  9. 23 3月, 2016 2 次提交
  10. 21 1月, 2016 2 次提交
    • J
      ptrace: use fsuid, fsgid, effective creds for fs access checks · caaee623
      Jann Horn 提交于
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caaee623
    • O
      ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable · 7c3b00e0
      Oleg Nesterov 提交于
      ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
      tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.
      
      This doesn't really solve the problem(s) and we probably need to fix the
      freezer.  In particular, note that this means that pm freezer will fail if
      it races attach-to-stopped-task.
      
      And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
      not clear if we really need to hide this transition from debugger, WNOHANG
      after PTRACE_ATTACH can fail anyway if it races with SIGCONT.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c3b00e0
  11. 28 10月, 2015 1 次提交
    • T
      seccomp, ptrace: add support for dumping seccomp filters · f8e529ed
      Tycho Andersen 提交于
      This patch adds support for dumping a process' (classic BPF) seccomp
      filters via ptrace.
      
      PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
      seccomp filters. addr should be an integer which represents the ith seccomp
      filter (0 is the most recently installed filter). data should be a struct
      sock_filter * with enough room for the ith filter, or NULL, in which case
      the filter is not saved. The return value for this command is the number of
      BPF instructions the program represents, or negative in the case of errors.
      Command specific errors are ENOENT: which indicates that there is no ith
      filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
      filter was not installed as a classic BPF filter.
      
      A caveat with this approach is that there is no way to get explicitly at
      the heirarchy of seccomp filters, and users need to memcmp() filters to
      decide which are inherited. This means that a task which installs two of
      the same filter can potentially confuse users of this interface.
      
      v2: * make save_orig const
          * check that the orig_prog exists (not necessary right now, but when
             grows eBPF support it will be)
          * s/n/filter_off and make it an unsigned long to match ptrace
          * count "down" the tree instead of "up" when passing a filter offset
      
      v3: * don't take the current task's lock for inspecting its seccomp mode
          * use a 0x42** constant for the ptrace command value
      
      v4: * don't copy to userspace while holding spinlocks
      
      v5: * add another condition to WARN_ON
      
      v6: * rebase on net-next
      Signed-off-by: NTycho Andersen <tycho.andersen@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      CC: Will Drewry <wad@chromium.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Pavel Emelyanov <xemul@parallels.com>
      CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      CC: Alexei Starovoitov <ast@kernel.org>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8e529ed
  12. 16 7月, 2015 1 次提交
    • T
      seccomp: add ptrace options for suspend/resume · 13c4a901
      Tycho Andersen 提交于
      This patch is the first step in enabling checkpoint/restore of processes
      with seccomp enabled.
      
      One of the things CRIU does while dumping tasks is inject code into them
      via ptrace to collect information that is only available to the process
      itself. However, if we are in a seccomp mode where these processes are
      prohibited from making these syscalls, then what CRIU does kills the task.
      
      This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
      a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
      filters to disable (and re-enable) seccomp filters for another task so that
      they can be successfully dumped (and restored). We restrict the set of
      processes that can disable seccomp through ptrace because although today
      ptrace can be used to bypass seccomp, there is some discussion of closing
      this loophole in the future and we would like this patch to not depend on
      that behavior and be future proofed for when it is removed.
      
      Note that seccomp can be suspended before any filters are actually
      installed; this behavior is useful on criu restore, so that we can suspend
      seccomp, restore the filters, unmap our restore code from the restored
      process' address space, and then resume the task by detaching and have the
      filters resumed as well.
      
      v2 changes:
      
      * require that the tracer have no seccomp filters installed
      * drop TIF_NOTSC manipulation from the patch
      * change from ptrace command to a ptrace option and use this ptrace option
        as the flag to check. This means that as soon as the tracer
        detaches/dies, seccomp is re-enabled and as a corrollary that one can not
        disable seccomp across PTRACE_ATTACHs.
      
      v3 changes:
      
      * get rid of various #ifdefs everywhere
      * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
        used
      
      v4 changes:
      
      * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
        directly
      
      v5 changes:
      
      * check that seccomp is not enabled (or suspended) on the tracer
      Signed-off-by: NTycho Andersen <tycho.andersen@canonical.com>
      CC: Will Drewry <wad@chromium.org>
      CC: Roland McGrath <roland@hack.frob.com>
      CC: Pavel Emelyanov <xemul@parallels.com>
      CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      [kees: access seccomp.mode through seccomp_mode() instead]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      13c4a901
  13. 17 4月, 2015 2 次提交
    • O
      ptrace: ptrace_detach() can no longer race with SIGKILL · 64a4096c
      Oleg Nesterov 提交于
      ptrace_detach() re-checks ->ptrace under tasklist lock and calls
      release_task() if __ptrace_detach() returns true.  This was needed because
      the __TASK_TRACED tracee could be killed/untraced, and it could even pass
      exit_notify() before we take tasklist_lock.
      
      But this is no longer possible after 9899d11f "ptrace: ensure
      arch_ptrace/ptrace_request can never race with SIGKILL".  We can turn
      these checks into WARN_ON() and remove release_task().
      
      While at it, document the setting of child->exit_code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pavel Labath <labath@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64a4096c
    • O
      ptrace: fix race between ptrace_resume() and wait_task_stopped() · b72c1869
      Oleg Nesterov 提交于
      ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
      tracee->exit_code and then wake_up_state() changes tracee->state.  If the
      tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
      wrongly looks like another report from tracee.
      
      This confuses debugger, and since wait_task_stopped() clears ->exit_code
      the tracee can miss a signal.
      
      Test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	#include <sys/ptrace.h>
      	#include <pthread.h>
      	#include <assert.h>
      
      	int pid;
      
      	void *waiter(void *arg)
      	{
      		int stat;
      
      		for (;;) {
      			assert(pid == wait(&stat));
      			assert(WIFSTOPPED(stat));
      			if (WSTOPSIG(stat) == SIGHUP)
      				continue;
      
      			assert(WSTOPSIG(stat) == SIGCONT);
      			printf("ERR! extra/wrong report:%x\n", stat);
      		}
      	}
      
      	int main(void)
      	{
      		pthread_t thread;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			for (;;)
      				kill(getpid(), SIGHUP);
      		}
      
      		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
      
      		for (;;)
      			ptrace(PTRACE_CONT, pid, 0, SIGCONT);
      
      		return 0;
      	}
      
      Note for stable: the bug is very old, but without 9899d11f "ptrace:
      ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
      should use lock_task_sighand(child).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NPavel Labath <labath@google.com>
      Tested-by: NPavel Labath <labath@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b72c1869
  14. 18 2月, 2015 1 次提交
  15. 11 12月, 2014 1 次提交
  16. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  17. 06 3月, 2014 1 次提交
  18. 13 11月, 2013 1 次提交
    • K
      exec/ptrace: fix get_dumpable() incorrect tests · d049f74f
      Kees Cook 提交于
      The get_dumpable() return value is not boolean.  Most users of the
      function actually want to be testing for non-SUID_DUMP_USER(1) rather than
      SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
      protected state.  Almost all places did this correctly, excepting the two
      places fixed in this patch.
      
      Wrong logic:
          if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
              or
          if (dumpable == 0) { /* be protective */ }
              or
          if (!dumpable) { /* be protective */ }
      
      Correct logic:
          if (dumpable != SUID_DUMP_USER) { /* be protective */ }
              or
          if (dumpable != 1) { /* be protective */ }
      
      Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
      user was able to ptrace attach to processes that had dropped privileges to
      that user.  (This may have been partially mitigated if Yama was enabled.)
      
      The macros have been moved into the file that declares get/set_dumpable(),
      which means things like the ia64 code can see them too.
      
      CVE-2013-2929
      Reported-by: NVasily Kulikov <segoon@openwall.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d049f74f
  19. 12 9月, 2013 1 次提交
  20. 07 8月, 2013 1 次提交
  21. 10 7月, 2013 2 次提交
    • O
      ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child) · fab840fc
      Oleg Nesterov 提交于
      Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child).  This
      frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
      ensures that the tracee won't be killed by SIGTRAP triggered by the
      active breakpoints.
      
      Test-case:
      
      	unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
      	{
      		unsigned long dr7;
      
      		dr7 = ((len | type) & 0xf)
      			<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
      		if (enable)
      			dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
      		return dr7;
      	}
      
      	int write_dr(int pid, int dr, unsigned long val)
      	{
      		return ptrace(PTRACE_POKEUSER, pid,
      				offsetof (struct user, u_debugreg[dr]),
      				val);
      	}
      
      	void func(void)
      	{
      	}
      
      	int main(void)
      	{
      		int pid, stat;
      		unsigned long dr7;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			kill(getpid(), SIGHUP);
      
      			func();
      			return 0x13;
      		}
      
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(WSTOPSIG(stat) == SIGHUP);
      
      		assert(write_dr(pid, 0, (long)func) == 0);
      		dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
      		assert(write_dr(pid, 7, dr7) == 0);
      
      		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(stat == 0x1300);
      
      		return 0;
      	}
      
      Before this patch the child is killed after PTRACE_DETACH.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fab840fc
    • O
      ptrace: revert "Prepare to fix racy accesses on task breakpoints" · 7c8df286
      Oleg Nesterov 提交于
      This reverts commit bf26c018 ("Prepare to fix racy accesses on task
      breakpoints").
      
      The patch was fine but we can no longer race with SIGKILL after commit
      9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race
      with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
      ->ptrace_bps[] can't go away.
      
      Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
      we can kill them and remove task->ptrace_bp_refcnt.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c8df286
  22. 04 7月, 2013 1 次提交
    • A
      ptrace: add ability to get/set signal-blocked mask · 29000cae
      Andrey Vagin 提交于
      crtools uses a parasite code for dumping processes.  The parasite code is
      injected into a process with help PTRACE_SEIZE.
      
      Currently crtools blocks signals from a parasite code.  If a process has
      pending signals, crtools wait while a process handles these signals.
      
      This method is not suitable for stopped tasks.  A stopped task can have a
      few pending signals, when we will try to execute a parasite code, we will
      need to drop SIGSTOP, but all other signals must remain pending, because a
      state of processes must not be changed during checkpointing.
      
      This patch adds two ptrace commands to set/get signal-blocked mask.
      
      I think gdb can use this commands too.
      
      [akpm@linux-foundation.org: be consistent with brace layout]
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29000cae
  23. 30 6月, 2013 1 次提交
  24. 08 5月, 2013 1 次提交
  25. 01 5月, 2013 1 次提交
    • A
      ptrace: add ability to retrieve signals without removing from a queue (v4) · 84c751bd
      Andrey Vagin 提交于
      This patch adds a new ptrace request PTRACE_PEEKSIGINFO.
      
      This request is used to retrieve information about pending signals
      starting with the specified sequence number.  Siginfo_t structures are
      copied from the child into the buffer starting at "data".
      
      The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
      struct ptrace_peeksiginfo_args {
      	u64 off;	/* from which siginfo to start */
      	u32 flags;
      	s32 nr;		/* how may siginfos to take */
      };
      
      "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
      i386 and a negative values is used for errors.
      
      Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
      signals from process-wide queue.  If this flag is not set, signals are
      read from a per-thread queue.
      
      The request PTRACE_PEEKSIGINFO returns a number of dumped signals.  If a
      signal with the specified sequence number doesn't exist, ptrace returns
      zero.  The request returns an error, if no signal has been dumped.
      
      Errors:
      EINVAL - one or more specified flags are not supported or nr is negative
      EFAULT - buf or addr is outside your accessible address space.
      
      A result siginfo contains a kernel part of si_code which usually striped,
      but it's required for queuing the same siginfo back during restore of
      pending signals.
      
      This functionality is required for checkpointing pending signals.  Pedro
      Alves suggested using it in "gdb" to peek at pending signals.  gdb already
      uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
      dequeued.  This functionality allows gdb to look at the pending signals
      which were not reported yet.
      
      The prototype of this code was developed by Oleg Nesterov.
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Alves <palves@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c751bd
  26. 09 2月, 2013 1 次提交
    • J
      uprobes: Add exports for module use · e8440c14
      Josh Stone 提交于
      The original pull message for uprobes (commit 654443e2) noted:
      
        This tree includes uprobes support in 'perf probe' - but SystemTap
        (and other tools) can take advantage of user probe points as well.
      
      In order to actually be usable in module-based tools like SystemTap, the
      interface needs to be exported.  This patch first adds the obvious
      exports for uprobe_register and uprobe_unregister.  Then it also adds
      one for task_user_regset_view, which is necessary to get the correct
      state of userspace registers.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      e8440c14
  27. 23 1月, 2013 2 次提交
    • O
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 9899d11f
      Oleg Nesterov 提交于
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      Reported-by: NSalman Qazi <sqazi@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9899d11f
    • O
      ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() · 910ffdb1
      Oleg Nesterov 提交于
      Cleanup and preparation for the next change.
      
      signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
      actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
      necessary mask.
      
      Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
      signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
      which adds __TASK_TRACED.
      
      This way ptrace_signal_wake_up() can work "inside" ptrace_request()
      even if the tracee doesn't have the TASK_WAKEKILL bit set.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910ffdb1
  28. 21 1月, 2013 1 次提交
  29. 18 12月, 2012 1 次提交
    • O
      ptrace: introduce PTRACE_O_EXITKILL · 992fb6e1
      Oleg Nesterov 提交于
      Ptrace jailers want to be sure that the tracee can never escape
      from the control. However if the tracer dies unexpectedly the
      tracee continues to run in potentially unsafe mode.
      
      Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
      it sends SIGKILL to every tracee which has this bit set.
      
      Note that the new option is not equal to the last-option << 1.  Because
      currently all options have an event, and the new one starts the eventless
      group.  It uses the random 20 bit, so we have the room for 12 more events,
      but we can also add the new eventless options below this one.
      
      Suggested by Amnon Shiloh.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: Chris Evans <scarybeasts@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      992fb6e1
  30. 20 11月, 2012 1 次提交
  31. 03 8月, 2012 1 次提交