1. 14 11月, 2018 1 次提交
  2. 23 8月, 2018 17 次提交
  3. 10 8月, 2018 1 次提交
    • E
      signal: Don't restart fork when signals come in. · c3ad2c3b
      Eric W. Biederman 提交于
      Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
      report that a periodic signal received during fork can cause fork to
      continually restart preventing an application from making progress.
      
      The code was being overly pessimistic.  Fork needs to guarantee that a
      signal sent to multiple processes is logically delivered before the
      fork and just to the forking process or logically delivered after the
      fork to both the forking process and it's newly spawned child.  For
      signals like periodic timers that are always delivered to a single
      process fork can safely complete and let them appear to logically
      delivered after the fork().
      
      While examining this issue I also discovered that fork today will miss
      signals delivered to multiple processes during the fork and handled by
      another thread.  Similarly the current code will also miss blocked
      signals that are delivered to multiple process, as those signals will
      not appear pending during fork.
      
      Add a list of each thread that is currently forking, and keep on that
      list a signal set that records all of the signals sent to multiple
      processes.  When fork completes initialize the new processes
      shared_pending signal set with it.  The calculate_sigpending function
      will see those signals and set TIF_SIGPENDING causing the new task to
      take the slow path to userspace to handle those signals.  Making it
      appear as if those signals were received immediately after the fork.
      
      It is not possible to send real time signals to multiple processes and
      exceptions don't go to multiple processes, which means that that are
      no signals sent to multiple processes that require siginfo.  This
      means it is safe to not bother collecting siginfo on signals sent
      during fork.
      
      The sigaction of a child of fork is initially the same as the
      sigaction of the parent process.  So a signal the parent ignores the
      child will also initially ignore.  Therefore it is safe to ignore
      signals sent to multiple processes and ignored by the forking process.
      
      Signals sent to only a single process or only a single thread and delivered
      during fork are treated as if they are received after the fork, and generally
      not dealt with.  They won't cause any problems.
      
      V2: Added removal from the multiprocess list on failure.
      V3: Use -ERESTARTNOINTR directly
      V4: - Don't queue both SIGCONT and SIGSTOP
          - Initialize signal_struct.multiprocess in init_task
          - Move setting of shared_pending to before the new task
            is visible to signals.  This prevents signals from comming
            in before shared_pending.signal is set to delayed.signal
            and being lost.
      V5: - rework list add and delete to account for idle threads
      v6: - Use sigdelsetmask when removing stop signals
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
      Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
      Reported-by: Nmajiang <ma.jiang@zte.com.cn>
      Fixes: 4a2c7a78 ("[PATCH] make fork() atomic wrt pgrp/session signals")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c3ad2c3b
  4. 04 8月, 2018 2 次提交
    • E
      fork: Have new threads join on-going signal group stops · 924de3b8
      Eric W. Biederman 提交于
      There are only two signals that are delivered to every member of a
      signal group: SIGSTOP and SIGKILL.  Signal delivery requires every
      signal appear to be delivered either before or after a clone syscall.
      SIGKILL terminates the clone so does not need to be considered.  Which
      leaves only SIGSTOP that needs to be considered when creating new
      threads.
      
      Today in the event of a group stop TIF_SIGPENDING will get set and the
      fork will restart ensuring the fork syscall participates in the group
      stop.
      
      A fork (especially of a process with a lot of memory) is one of the
      most expensive system so we really only want to restart a fork when
      necessary.
      
      It is easy so check to see if a SIGSTOP is ongoing and have the new
      thread join it immediate after the clone completes.  Making it appear
      the clone completed happened just before the SIGSTOP.
      
      The calculate_sigpending function will see the bits set in jobctl and
      set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
      
      V2: The call to task_join_group_stop was moved before the new task is
          added to the thread group list.  This should not matter as
          sighand->siglock is held over both the addition of the threads,
          the call to task_join_group_stop and do_signal_stop.  But the change
          is trivial and it is one less thing to worry about when reading
          the code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      924de3b8
    • E
      signal: Add calculate_sigpending() · 088fe47c
      Eric W. Biederman 提交于
      Add a function calculate_sigpending to test to see if any signals are
      pending for a new task immediately following fork.  Signals have to
      happen either before or after fork.  Today our practice is to push
      all of the signals to before the fork, but that has the downside that
      frequent or periodic signals can make fork take much much longer than
      normal or prevent fork from completing entirely.
      
      So we need move signals that we can after the fork to prevent that.
      
      This updates the code to set TIF_SIGPENDING on a new task if there
      are signals or other activities that have moved so that they appear
      to happen after the fork.
      
      As the code today restarts if it sees any such activity this won't
      immediately have an effect, as there will be no reason for it
      to set TIF_SIGPENDING immediately after the fork.
      
      Adding calculate_sigpending means the code in fork can safely be
      changed to not always restart if a signal is pending.
      
      The new calculate_sigpending function sets sigpending if there
      are pending bits in jobctl, pending signals, the freezer needs
      to freeze the new task or the live kernel patching framework
      need the new thread to take the slow path to userspace.
      
      I have verified that setting TIF_SIGPENDING does make a new process
      take the slow path to userspace before it executes it's first userspace
      instruction.
      
      I have looked at the callers of signal_wake_up and the code paths
      setting TIF_SIGPENDING and I don't see anything else that needs to be
      handled.  The code probably doesn't need to set TIF_SIGPENDING for the
      kernel live patching as it uses a separate thread flag as well.  But
      at this point it seems safer reuse the recalc_sigpending logic and get
      the kernel live patching folks to sort out their story later.
      
      V2: I have moved the test into schedule_tail where siglock can
          be grabbed and recalc_sigpending can be reused directly.
          Further as the last action of setting up a new task this
          guarantees that TIF_SIGPENDING will be properly set in the
          new process.
      
          The helper calculate_sigpending takes the siglock and
          uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
          clear TIF_SIGPENDING if it is unnecessary.  This allows reusing
          the existing code and keeps maintenance of the conditions simple.
      
          Oleg Nesterov <oleg@redhat.com>  suggested the movement
          and pointed out the need to take siglock if this code
          was going to be called while the new task is discoverable.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      088fe47c
  5. 22 7月, 2018 5 次提交
  6. 21 7月, 2018 1 次提交
    • E
      signal: Pass pid and pid type into send_sigqueue · 24122c7f
      Eric W. Biederman 提交于
      Make the code more maintainable by performing more of the signal
      related work in send_sigqueue.
      
      A quick inspection of do_timer_create will show that this code path
      does not lookup a thread group by a thread's pid.  Making it safe
      to find the task pointed to by it_pid with "pid_task(it_pid, type)";
      
      This supports the changes needed in fork to tell if a signal was sent
      to a single process or a group of processes.
      
      Having the pid to task transition in signal.c will also make it easier
      to sort out races with de_thread and and the thread group leader
      exiting when it comes time to address that.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      24122c7f
  7. 10 6月, 2018 1 次提交
  8. 04 5月, 2018 1 次提交
    • P
      sched/core: Introduce set_special_state() · b5bf9a90
      Peter Zijlstra 提交于
      Gaurav reported a perceived problem with TASK_PARKED, which turned out
      to be a broken wait-loop pattern in __kthread_parkme(), but the
      reported issue can (and does) in fact happen for states that do not do
      condition based sleeps.
      
      When the 'current->state = TASK_RUNNING' store of a previous
      (concurrent) try_to_wake_up() collides with the setting of a 'special'
      sleep state, we can loose the sleep state.
      
      Normal condition based wait-loops are immune to this problem, but for
      sleep states that are not condition based are subject to this problem.
      
      There already is a fix for TASK_DEAD. Abstract that and also apply it
      to TASK_STOPPED and TASK_TRACED, both of which are also without
      condition based wait-loop.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b5bf9a90
  9. 27 4月, 2018 2 次提交
    • E
      signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} · 31931c93
      Eric W. Biederman 提交于
      Update the siginfo_layout function and enum siginfo_layout to represent
      all of the possible field layouts of struct siginfo.
      
      This allows the uses of siginfo_layout in um and arm64 where they are testing
      for SIL_FAULT to be more accurate as this rules out the other cases.
      
      Further this allows the switch statements on siginfo_layout to be simpler
      if perhaps a little more wordy.  Making it easier to understand what is
      actually going on.
      
      As SIL_FAULT_BNDERR and SIL_FAULT_PKUERR are never expected to appear
      in signalfd just treat them as SIL_FAULT.  To include them would take
      20 extra bytes an pretty much fill up what is left of
      signalfd_siginfo.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      31931c93
    • E
      signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code · 36a4ca3d
      Eric W. Biederman 提交于
      The only architecture that does not support SEGV_PKUERR is ia64 and
      ia64 has not had 32bit support since some time in 2008.  Therefore
      copy_siginfo_to_user32 and copy_siginfo_from_user32 do not need to
      include support for a missing SEGV_PKUERR.
      
      Compile test on ia64.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      36a4ca3d
  10. 25 4月, 2018 4 次提交
    • E
      signal: Remove ifdefs for BUS_MCEERR_AR and BUS_MCEERR_AO · 4181d225
      Eric W. Biederman 提交于
      With the recent architecture cleanups these si_codes are always
      defined so there is no need to test for them.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4181d225
    • E
      signal: Remove SEGV_BNDERR ifdefs · 3a11ab14
      Eric W. Biederman 提交于
      After the last round of cleanups to siginfo.h SEGV_BNDERR is defined
      on all architectures so testing to see if it is defined is unnecessary.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3a11ab14
    • E
      signal: Stop special casing TRAP_FIXME and FPE_FIXME in siginfo_layout · 0c362f96
      Eric W. Biederman 提交于
      After more experience with the cases where no one the si_code of 0
      is used both as a signal specific si_code, and as SI_USER it appears
      that no one cares about the signal specific si_code case and the
      good solution is to just fix the architectures by using
      a different si_code.
      
      In none of the conversations has anyone even suggested that
      anything depends on the signal specific redefinition of SI_USER.
      
      There are at least test cases that care when si_code as 0 does
      not work as si_user.
      
      So make things simple and keep the generic code from introducing
      problems by removing the special casing of TRAP_FIXME and FPE_FIXME.
      This will ensure the generic case of sending a signal with
      kill will always set SI_USER and work.
      
      The architecture specific, and signal specific overloads that
      set si_code to 0 will now have problems with signalfd and
      the 32bit compat versions of siginfo copying.   At least
      until they are fixed.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0c362f96
    • E
      signal: Reduce copy_siginfo_to_user to just copy_to_user · c999b933
      Eric W. Biederman 提交于
      Now that every instance of struct siginfo is now initialized it is no
      longer necessary to copy struct siginfo piece by piece to userspace
      but instead the entire structure can be copied.
      
      As well as making the code simpler and more efficient this means that
      copy_sinfo_to_user no longer cares which union member of struct
      siginfo is in use.
      
      In practice this means that all 32bit architectures that define
      FPE_FIXME will handle properly send SI_USER when kill(SIGFPE) is sent.
      While still performing their historic architectural brokenness when 0
      is used a floating pointer signal.  This matches the current behavior
      of 64bit architectures that define FPE_FIXME who get lucky and an
      overloaded SI_USER has continuted to work through copy_siginfo_to_user
      because the 8 byte si_addr occupies the same bytes in struct siginfo
      as the 4 byte si_pid and the 4 byte si_uid.
      
      Problematic architectures still need to fix their ABI so that signalfd
      and 32bit compat code will work properly.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c999b933
  11. 03 4月, 2018 2 次提交
  12. 09 3月, 2018 1 次提交
    • D
      arm64: signal: Ensure si_code is valid for all fault signals · af40ff68
      Dave Martin 提交于
      Currently, as reported by Eric, an invalid si_code value 0 is
      passed in many signals delivered to userspace in response to faults
      and other kernel errors.  Typically 0 is passed when the fault is
      insufficiently diagnosable or when there does not appear to be any
      sensible alternative value to choose.
      
      This appears to violate POSIX, and is intuitively wrong for at
      least two reasons arising from the fact that 0 == SI_USER:
      
       1) si_code is a union selector, and SI_USER (and si_code <= 0 in
          general) implies the existence of a different set of fields
          (siginfo._kill) from that which exists for a fault signal
          (siginfo._sigfault).  However, the code raising the signal
          typically writes only the _sigfault fields, and the _kill
          fields make no sense in this case.
      
          Thus when userspace sees si_code == 0 (SI_USER) it may
          legitimately inspect fields in the inactive union member _kill
          and obtain garbage as a result.
      
          There appears to be software in the wild relying on this,
          albeit generally only for printing diagnostic messages.
      
       2) Software that wants to be robust against spurious signals may
          discard signals where si_code == SI_USER (or <= 0), or may
          filter such signals based on the si_uid and si_pid fields of
          siginfo._sigkill.  In the case of fault signals, this means
          that important (and usually fatal) error conditions may be
          silently ignored.
      
      In practice, many of the faults for which arm64 passes si_code == 0
      are undiagnosable conditions such as exceptions with syndrome
      values in ESR_ELx to which the architecture does not yet assign any
      meaning, or conditions indicative of a bug or error in the kernel
      or system and thus that are unrecoverable and should never occur in
      normal operation.
      
      The approach taken in this patch is to translate all such
      undiagnosable or "impossible" synchronous fault conditions to
      SIGKILL, since these are at least probably localisable to a single
      process.  Some of these conditions should really result in a kernel
      panic, but due to the lack of diagnostic information it is
      difficult to be certain: this patch does not add any calls to
      panic(), but this could change later if justified.
      
      Although si_code will not reach userspace in the case of SIGKILL,
      it is still desirable to pass a nonzero value so that the common
      siginfo handling code can detect incorrect use of si_code == 0
      without false positives.  In this case the si_code dependent
      siginfo fields will not be correctly initialised, but since they
      are not passed to userspace I deem this not to matter.
      
      A few faults can reasonably occur in realistic userspace scenarios,
      and _should_ raise a regular, handleable (but perhaps not
      ignorable/blockable) signal: for these, this patch attempts to
      choose a suitable standard si_code value for the raised signal in
      each case instead of 0.
      
      arm64 was the only arch to define a BUS_FIXME code, so after this
      patch nobody defines it.  This patch therefore also removes the
      relevant code from siginfo_layout().
      
      Cc: James Morse <james.morse@arm.com>
      Reported-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      af40ff68
  13. 07 3月, 2018 1 次提交
  14. 23 1月, 2018 1 次提交
    • E
      signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed · f71dd7dc
      Eric W. Biederman 提交于
      There are so many places that build struct siginfo by hand that at
      least one of them is bound to get it wrong.  A handful of cases in the
      kernel arguably did just that when using the errno field of siginfo to
      pass no errno values to userspace.  The usage is limited to a single
      si_code so at least does not mess up anything else.
      
      Encapsulate this questionable pattern in a helper function so
      that the userspace ABI is preserved.
      
      Update all of the places that use this pattern to use the new helper
      function.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f71dd7dc