1. 20 2月, 2019 1 次提交
    • E
      signal: Restore the stop PTRACE_EVENT_EXIT · a2b3e2c0
      Eric W. Biederman 提交于
      commit cf43a757fd49442bc38f76088b70c2299eed2c2f upstream.
      
      In the middle of do_exit() there is there is a call
      "ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
      in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
      for the debugger to release the task or SIGKILL to be delivered.
      
      Skipping past dequeue_signal when we know a fatal signal has already
      been delivered resulted in SIGKILL remaining pending and
      TIF_SIGPENDING remaining set.  This in turn caused the
      scheduler to not sleep in PTACE_EVENT_EXIT as it figured
      a fatal signal was pending.  This also caused ptrace_freeze_traced
      in ptrace_check_attach to fail because it left a per thread
      SIGKILL pending which is what fatal_signal_pending tests for.
      
      This difference in signal state caused strace to report
      strace: Exit of unknown pid NNNNN ignored
      
      Therefore update the signal handling state like dequeue_signal
      would when removing a per thread SIGKILL, by removing SIGKILL
      from the per thread signal mask and clearing TIF_SIGPENDING.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NIvan Delalande <colona@arista.com>
      Cc: stable@vger.kernel.org
      Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2b3e2c0
  2. 15 2月, 2019 2 次提交
    • E
      signal: Better detection of synchronous signals · 959e46af
      Eric W. Biederman 提交于
      commit 7146db3317c67b517258cb5e1b08af387da0618b upstream.
      
      Recently syzkaller was able to create unkillablle processes by
      creating a timer that is delivered as a thread local signal on SIGHUP,
      and receiving SIGHUP SA_NODEFERER.  Ultimately causing a loop failing
      to deliver SIGHUP but always trying.
      
      When the stack overflows delivery of SIGHUP fails and force_sigsegv is
      called.  Unfortunately because SIGSEGV is numerically higher than
      SIGHUP next_signal tries again to deliver a SIGHUP.
      
      From a quality of implementation standpoint attempting to deliver the
      timer SIGHUP signal is wrong.  We should attempt to deliver the
      synchronous SIGSEGV signal we just forced.
      
      We can make that happening in a fairly straight forward manner by
      instead of just looking at the signal number we also look at the
      si_code.  In particular for exceptions (aka synchronous signals) the
      si_code is always greater than 0.
      
      That still has the potential to pick up a number of asynchronous
      signals as in a few cases the same si_codes that are used
      for synchronous signals are also used for asynchronous signals,
      and SI_KERNEL is also included in the list of possible si_codes.
      
      Still the heuristic is much better and timer signals are definitely
      excluded.  Which is enough to prevent all known ways for someone
      sending a process signals fast enough to cause unexpected and
      arguably incorrect behavior.
      
      Cc: stable@vger.kernel.org
      Fixes: a27341cd ("Prioritize synchronous signals over 'normal' signals")
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      959e46af
    • E
      signal: Always notice exiting tasks · f681f268
      Eric W. Biederman 提交于
      commit 35634ffa1751b6efd8cf75010b509dcb0263e29b upstream.
      
      Recently syzkaller was able to create unkillablle processes by
      creating a timer that is delivered as a thread local signal on SIGHUP,
      and receiving SIGHUP SA_NODEFERER.  Ultimately causing a loop
      failing to deliver SIGHUP but always trying.
      
      Upon examination it turns out part of the problem is actually most of
      the solution.  Since 2.5 signal delivery has found all fatal signals,
      marked the signal group for death, and queued SIGKILL in every threads
      thread queue relying on signal->group_exit_code to preserve the
      information of which was the actual fatal signal.
      
      The conversion of all fatal signals to SIGKILL results in the
      synchronous signal heuristic in next_signal kicking in and preferring
      SIGHUP to SIGKILL.  Which is especially problematic as all
      fatal signals have already been transformed into SIGKILL.
      
      Instead of dequeueing signals and depending upon SIGKILL to
      be the first signal dequeued, first test if the signal group
      has already been marked for death.  This guarantees that
      nothing in the signal queue can prevent a process that needs
      to exit from exiting.
      
      Cc: stable@vger.kernel.org
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Ref: ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4")
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f681f268
  3. 14 11月, 2018 3 次提交
  4. 23 8月, 2018 17 次提交
  5. 10 8月, 2018 1 次提交
    • E
      signal: Don't restart fork when signals come in. · c3ad2c3b
      Eric W. Biederman 提交于
      Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
      report that a periodic signal received during fork can cause fork to
      continually restart preventing an application from making progress.
      
      The code was being overly pessimistic.  Fork needs to guarantee that a
      signal sent to multiple processes is logically delivered before the
      fork and just to the forking process or logically delivered after the
      fork to both the forking process and it's newly spawned child.  For
      signals like periodic timers that are always delivered to a single
      process fork can safely complete and let them appear to logically
      delivered after the fork().
      
      While examining this issue I also discovered that fork today will miss
      signals delivered to multiple processes during the fork and handled by
      another thread.  Similarly the current code will also miss blocked
      signals that are delivered to multiple process, as those signals will
      not appear pending during fork.
      
      Add a list of each thread that is currently forking, and keep on that
      list a signal set that records all of the signals sent to multiple
      processes.  When fork completes initialize the new processes
      shared_pending signal set with it.  The calculate_sigpending function
      will see those signals and set TIF_SIGPENDING causing the new task to
      take the slow path to userspace to handle those signals.  Making it
      appear as if those signals were received immediately after the fork.
      
      It is not possible to send real time signals to multiple processes and
      exceptions don't go to multiple processes, which means that that are
      no signals sent to multiple processes that require siginfo.  This
      means it is safe to not bother collecting siginfo on signals sent
      during fork.
      
      The sigaction of a child of fork is initially the same as the
      sigaction of the parent process.  So a signal the parent ignores the
      child will also initially ignore.  Therefore it is safe to ignore
      signals sent to multiple processes and ignored by the forking process.
      
      Signals sent to only a single process or only a single thread and delivered
      during fork are treated as if they are received after the fork, and generally
      not dealt with.  They won't cause any problems.
      
      V2: Added removal from the multiprocess list on failure.
      V3: Use -ERESTARTNOINTR directly
      V4: - Don't queue both SIGCONT and SIGSTOP
          - Initialize signal_struct.multiprocess in init_task
          - Move setting of shared_pending to before the new task
            is visible to signals.  This prevents signals from comming
            in before shared_pending.signal is set to delayed.signal
            and being lost.
      V5: - rework list add and delete to account for idle threads
      v6: - Use sigdelsetmask when removing stop signals
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
      Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
      Reported-by: Nmajiang <ma.jiang@zte.com.cn>
      Fixes: 4a2c7a78 ("[PATCH] make fork() atomic wrt pgrp/session signals")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c3ad2c3b
  6. 04 8月, 2018 2 次提交
    • E
      fork: Have new threads join on-going signal group stops · 924de3b8
      Eric W. Biederman 提交于
      There are only two signals that are delivered to every member of a
      signal group: SIGSTOP and SIGKILL.  Signal delivery requires every
      signal appear to be delivered either before or after a clone syscall.
      SIGKILL terminates the clone so does not need to be considered.  Which
      leaves only SIGSTOP that needs to be considered when creating new
      threads.
      
      Today in the event of a group stop TIF_SIGPENDING will get set and the
      fork will restart ensuring the fork syscall participates in the group
      stop.
      
      A fork (especially of a process with a lot of memory) is one of the
      most expensive system so we really only want to restart a fork when
      necessary.
      
      It is easy so check to see if a SIGSTOP is ongoing and have the new
      thread join it immediate after the clone completes.  Making it appear
      the clone completed happened just before the SIGSTOP.
      
      The calculate_sigpending function will see the bits set in jobctl and
      set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
      
      V2: The call to task_join_group_stop was moved before the new task is
          added to the thread group list.  This should not matter as
          sighand->siglock is held over both the addition of the threads,
          the call to task_join_group_stop and do_signal_stop.  But the change
          is trivial and it is one less thing to worry about when reading
          the code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      924de3b8
    • E
      signal: Add calculate_sigpending() · 088fe47c
      Eric W. Biederman 提交于
      Add a function calculate_sigpending to test to see if any signals are
      pending for a new task immediately following fork.  Signals have to
      happen either before or after fork.  Today our practice is to push
      all of the signals to before the fork, but that has the downside that
      frequent or periodic signals can make fork take much much longer than
      normal or prevent fork from completing entirely.
      
      So we need move signals that we can after the fork to prevent that.
      
      This updates the code to set TIF_SIGPENDING on a new task if there
      are signals or other activities that have moved so that they appear
      to happen after the fork.
      
      As the code today restarts if it sees any such activity this won't
      immediately have an effect, as there will be no reason for it
      to set TIF_SIGPENDING immediately after the fork.
      
      Adding calculate_sigpending means the code in fork can safely be
      changed to not always restart if a signal is pending.
      
      The new calculate_sigpending function sets sigpending if there
      are pending bits in jobctl, pending signals, the freezer needs
      to freeze the new task or the live kernel patching framework
      need the new thread to take the slow path to userspace.
      
      I have verified that setting TIF_SIGPENDING does make a new process
      take the slow path to userspace before it executes it's first userspace
      instruction.
      
      I have looked at the callers of signal_wake_up and the code paths
      setting TIF_SIGPENDING and I don't see anything else that needs to be
      handled.  The code probably doesn't need to set TIF_SIGPENDING for the
      kernel live patching as it uses a separate thread flag as well.  But
      at this point it seems safer reuse the recalc_sigpending logic and get
      the kernel live patching folks to sort out their story later.
      
      V2: I have moved the test into schedule_tail where siglock can
          be grabbed and recalc_sigpending can be reused directly.
          Further as the last action of setting up a new task this
          guarantees that TIF_SIGPENDING will be properly set in the
          new process.
      
          The helper calculate_sigpending takes the siglock and
          uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
          clear TIF_SIGPENDING if it is unnecessary.  This allows reusing
          the existing code and keeps maintenance of the conditions simple.
      
          Oleg Nesterov <oleg@redhat.com>  suggested the movement
          and pointed out the need to take siglock if this code
          was going to be called while the new task is discoverable.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      088fe47c
  7. 22 7月, 2018 5 次提交
  8. 21 7月, 2018 1 次提交
    • E
      signal: Pass pid and pid type into send_sigqueue · 24122c7f
      Eric W. Biederman 提交于
      Make the code more maintainable by performing more of the signal
      related work in send_sigqueue.
      
      A quick inspection of do_timer_create will show that this code path
      does not lookup a thread group by a thread's pid.  Making it safe
      to find the task pointed to by it_pid with "pid_task(it_pid, type)";
      
      This supports the changes needed in fork to tell if a signal was sent
      to a single process or a group of processes.
      
      Having the pid to task transition in signal.c will also make it easier
      to sort out races with de_thread and and the thread group leader
      exiting when it comes time to address that.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      24122c7f
  9. 10 6月, 2018 1 次提交
  10. 04 5月, 2018 1 次提交
    • P
      sched/core: Introduce set_special_state() · b5bf9a90
      Peter Zijlstra 提交于
      Gaurav reported a perceived problem with TASK_PARKED, which turned out
      to be a broken wait-loop pattern in __kthread_parkme(), but the
      reported issue can (and does) in fact happen for states that do not do
      condition based sleeps.
      
      When the 'current->state = TASK_RUNNING' store of a previous
      (concurrent) try_to_wake_up() collides with the setting of a 'special'
      sleep state, we can loose the sleep state.
      
      Normal condition based wait-loops are immune to this problem, but for
      sleep states that are not condition based are subject to this problem.
      
      There already is a fix for TASK_DEAD. Abstract that and also apply it
      to TASK_STOPPED and TASK_TRACED, both of which are also without
      condition based wait-loop.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b5bf9a90
  11. 27 4月, 2018 2 次提交
    • E
      signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} · 31931c93
      Eric W. Biederman 提交于
      Update the siginfo_layout function and enum siginfo_layout to represent
      all of the possible field layouts of struct siginfo.
      
      This allows the uses of siginfo_layout in um and arm64 where they are testing
      for SIL_FAULT to be more accurate as this rules out the other cases.
      
      Further this allows the switch statements on siginfo_layout to be simpler
      if perhaps a little more wordy.  Making it easier to understand what is
      actually going on.
      
      As SIL_FAULT_BNDERR and SIL_FAULT_PKUERR are never expected to appear
      in signalfd just treat them as SIL_FAULT.  To include them would take
      20 extra bytes an pretty much fill up what is left of
      signalfd_siginfo.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      31931c93
    • E
      signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code · 36a4ca3d
      Eric W. Biederman 提交于
      The only architecture that does not support SEGV_PKUERR is ia64 and
      ia64 has not had 32bit support since some time in 2008.  Therefore
      copy_siginfo_to_user32 and copy_siginfo_from_user32 do not need to
      include support for a missing SEGV_PKUERR.
      
      Compile test on ia64.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      36a4ca3d
  12. 25 4月, 2018 4 次提交
    • E
      signal: Remove ifdefs for BUS_MCEERR_AR and BUS_MCEERR_AO · 4181d225
      Eric W. Biederman 提交于
      With the recent architecture cleanups these si_codes are always
      defined so there is no need to test for them.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4181d225
    • E
      signal: Remove SEGV_BNDERR ifdefs · 3a11ab14
      Eric W. Biederman 提交于
      After the last round of cleanups to siginfo.h SEGV_BNDERR is defined
      on all architectures so testing to see if it is defined is unnecessary.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3a11ab14
    • E
      signal: Stop special casing TRAP_FIXME and FPE_FIXME in siginfo_layout · 0c362f96
      Eric W. Biederman 提交于
      After more experience with the cases where no one the si_code of 0
      is used both as a signal specific si_code, and as SI_USER it appears
      that no one cares about the signal specific si_code case and the
      good solution is to just fix the architectures by using
      a different si_code.
      
      In none of the conversations has anyone even suggested that
      anything depends on the signal specific redefinition of SI_USER.
      
      There are at least test cases that care when si_code as 0 does
      not work as si_user.
      
      So make things simple and keep the generic code from introducing
      problems by removing the special casing of TRAP_FIXME and FPE_FIXME.
      This will ensure the generic case of sending a signal with
      kill will always set SI_USER and work.
      
      The architecture specific, and signal specific overloads that
      set si_code to 0 will now have problems with signalfd and
      the 32bit compat versions of siginfo copying.   At least
      until they are fixed.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0c362f96
    • E
      signal: Reduce copy_siginfo_to_user to just copy_to_user · c999b933
      Eric W. Biederman 提交于
      Now that every instance of struct siginfo is now initialized it is no
      longer necessary to copy struct siginfo piece by piece to userspace
      but instead the entire structure can be copied.
      
      As well as making the code simpler and more efficient this means that
      copy_sinfo_to_user no longer cares which union member of struct
      siginfo is in use.
      
      In practice this means that all 32bit architectures that define
      FPE_FIXME will handle properly send SI_USER when kill(SIGFPE) is sent.
      While still performing their historic architectural brokenness when 0
      is used a floating pointer signal.  This matches the current behavior
      of 64bit architectures that define FPE_FIXME who get lucky and an
      overloaded SI_USER has continuted to work through copy_siginfo_to_user
      because the 8 byte si_addr occupies the same bytes in struct siginfo
      as the 4 byte si_pid and the 4 byte si_uid.
      
      Problematic architectures still need to fix their ABI so that signalfd
      and 32bit compat code will work properly.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c999b933