1. 17 7月, 2019 2 次提交
    • O
      signal: simplify set_user_sigmask/restore_user_sigmask · b772434b
      Oleg Nesterov 提交于
      task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
      syscall paths.  This means that set_user_sigmask() can save ->blocked in
      ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
      was modified.
      
      This way the callers do not need 2 sigset_t's passed to set/restore and
      restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
      into the trivial helper which just calls restore_saved_sigmask().
      
      Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Deepa Dinamani <deepa.kernel@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Eric Wong <e@80x24.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b772434b
    • A
      signal: reorder struct sighand_struct · e2d9018e
      Alexey Dobriyan 提交于
      struct sighand_struct::siglock field is the most used field by far, put
      it first so that is can be accessed without IMM8 or IMM32 encoding on
      x86_64.
      
      Space savings (on trimmed down VM test config):
      
      add/remove: 0/0 grow/shrink: 8/68 up/down: 49/-1147 (-1098)
      Function                                     old     new   delta
      complete_signal                              512     533     +21
      do_signalfd4                                 335     346     +11
      __cleanup_sighand                             39      43      +4
      unhandled_signal                              49      52      +3
      prepare_signal                               692     695      +3
      ignore_signals                                37      40      +3
      __tty_check_change.part                      248     251      +3
      ksys_unshare                                 780     781      +1
      sighand_ctor                                  33      29      -4
      ptrace_trap_notify                            60      56      -4
      sigqueue_free                                 98      91      -7
      run_posix_cpu_timers                        1389    1382      -7
      proc_pid_status                             2448    2441      -7
      proc_pid_limits                              344     337      -7
      posix_cpu_timer_rearm                        222     215      -7
      posix_cpu_timer_get                          249     242      -7
      kill_pid_info_as_cred                        243     236      -7
      freeze_task                                  197     190      -7
      flush_old_exec                              1873    1866      -7
      do_task_stat                                3363    3356      -7
      do_send_sig_info                              98      91      -7
      do_group_exit                                147     140      -7
      init_sighand                                2088    2080      -8
      do_notify_parent_cldstop                     399     391      -8
      signalfd_cleanup                              50      41      -9
      do_notify_parent                             557     545     -12
      __send_signal                               1029    1017     -12
      ptrace_stop                                  590     577     -13
      get_signal                                  1576    1563     -13
      __lock_task_sighand                          112      99     -13
      zap_pid_ns_processes                         391     377     -14
      update_rlimit_cpu                             78      64     -14
      tty_signal_session_leader                    413     399     -14
      tty_open_proc_set_tty                        149     135     -14
      tty_jobctrl_ioctl                            936     922     -14
      set_cpu_itimer                               339     325     -14
      ptrace_resume                                226     212     -14
      ptrace_notify                                110      96     -14
      proc_clear_tty                                81      67     -14
      posix_cpu_timer_del                          229     215     -14
      kernel_sigaction                             156     142     -14
      getrusage                                    977     963     -14
      get_current_tty                               98      84     -14
      force_sigsegv                                 89      75     -14
      force_sig_info                               205     191     -14
      flush_signals                                 83      69     -14
      flush_itimer_signals                          85      71     -14
      do_timer_create                             1120    1106     -14
      do_sigpending                                 88      74     -14
      do_signal_stop                               537     523     -14
      cgroup_init_fs_context                       644     630     -14
      call_usermodehelper_exec_async               402     388     -14
      calculate_sigpending                          58      44     -14
      __x64_sys_timer_delete                       248     234     -14
      __set_current_blocked                         80      66     -14
      __ptrace_unlink                              310     296     -14
      __ptrace_detach.part                         187     173     -14
      send_sigqueue                                362     347     -15
      get_cpu_itimer                               214     199     -15
      signalfd_poll                                175     159     -16
      dequeue_signal                               340     323     -17
      do_getitimer                                 192     174     -18
      release_task.part                           1060    1040     -20
      ptrace_peek_siginfo                          408     387     -21
      posix_cpu_timer_set                          827     806     -21
      exit_signals                                 437     416     -21
      do_sigaction                                 541     520     -21
      do_setitimer                                 485     464     -21
      disassociate_ctty.part                       545     517     -28
      __x64_sys_rt_sigtimedwait                    721     679     -42
      __x64_sys_ptrace                            1319    1277     -42
      ptrace_request                              1828    1782     -46
      signalfd_read                                507     459     -48
      wait_consider_task                          2027    1971     -56
      do_coredump                                 3672    3616     -56
      copy_process.part                           6936    6871     -65
      
      Link: http://lkml.kernel.org/r/20190503192800.GA18004@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2d9018e
  2. 29 5月, 2019 3 次提交
    • E
      signal: Remove the signal number and task parameters from force_sig_info · a89e9b8a
      Eric W. Biederman 提交于
      force_sig_info always delivers to the current task and the signal
      parameter always matches info.si_signo.  So remove those parameters to
      make it a simpler less error prone interface, and to make it clear
      that none of the callers are doing anything clever.
      
      This guarantees that force_sig_info will not grow any new buggy
      callers that attempt to call force_sig on a non-current task, or that
      pass an signal number that does not match info.si_signo.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      a89e9b8a
    • E
      signal: Remove the task parameter from force_sig_fault · 2e1661d2
      Eric W. Biederman 提交于
      As synchronous exceptions really only make sense against the current
      task (otherwise how are you synchronous) remove the task parameter
      from from force_sig_fault to make it explicit that is what is going
      on.
      
      The two known exceptions that deliver a synchronous exception to a
      stopped ptraced task have already been changed to
      force_sig_fault_to_task.
      
      The callers have been changed with the following emacs regular expression
      (with obvious variations on the architectures that take more arguments)
      to avoid typos:
      
      force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
      ->
      force_sig_fault(\1,\2,\3)
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2e1661d2
    • E
      signal: Use force_sig_fault_to_task for the two calls that don't deliver to current · 91ca180d
      Eric W. Biederman 提交于
      In preparation for removing the task parameter from force_sig_fault
      introduce force_sig_fault_to_task and use it for the two cases where
      it matters.
      
      On mips force_fcr31_sig calls force_sig_fault and is called on either
      the current task, or a task that is suspended and is being switched to
      by the scheduler.  This is safe because the task being switched to by
      the scheduler is guaranteed to be suspended.  This ensures that
      task->sighand is stable while the signal is delivered to it.
      
      On parisc user_enable_single_step calls force_sig_fault and is in turn
      called by ptrace_request.  The function ptrace_request always calls
      user_enable_single_step on a child that is stopped for tracing.  The
      child being traced and not reaped ensures that child->sighand is not
      NULL, and that the child will not change child->sighand.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      91ca180d
  3. 27 5月, 2019 3 次提交
  4. 23 5月, 2019 1 次提交
    • E
      signal/usb: Replace kill_pid_info_as_cred with kill_pid_usb_asyncio · 70f1b0d3
      Eric W. Biederman 提交于
      The usb support for asyncio encoded one of it's values in the wrong
      field.  It should have used si_value but instead used si_addr which is
      not present in the _rt union member of struct siginfo.
      
      The practical result of this is that on a 64bit big endian kernel
      when delivering a signal to a 32bit process the si_addr field
      is set to NULL, instead of the expected pointer value.
      
      This issue can not be fixed in copy_siginfo_to_user32 as the usb
      usage of the the _sigfault (aka si_addr) member of the siginfo
      union when SI_ASYNCIO is set is incompatible with the POSIX and
      glibc usage of the _rt member of the siginfo union.
      
      Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
      dedicated function for this one specific case.  There are no other
      users of kill_pid_info_as_cred so this specialization should have no
      impact on the amount of code in the kernel.  Have kill_pid_usb_asyncio
      take instead of a siginfo_t which is difficult and error prone, 3
      arguments, a signal number, an errno value, and an address enconded as
      a sigval_t.  The encoding of the address as a sigval_t allows the
      code that reads the userspace request for a signal to handle this
      compat issue along with all of the other compat issues.
      
      Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
      the pointer value at the in si_pid (instead of si_addr).  That is the
      code now verifies that si_pid and si_addr always occur at the same
      location.  Further the code veries that for native structures a value
      placed in si_pid and spilling into si_uid will appear in userspace in
      si_addr (on a byte by byte copy of siginfo or a field by field copy of
      siginfo).  The code also verifies that for a 64bit kernel and a 32bit
      userspace the 32bit pointer will fit in si_pid.
      
      I have used the usbsig.c program below written by Alan Stern and
      slightly tweaked by me to run on a big endian machine to verify the
      issue exists (on sparc64) and to confirm the patch below fixes the issue.
      
       /* usbsig.c -- test USB async signal delivery */
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <fcntl.h>
       #include <signal.h>
       #include <string.h>
       #include <sys/ioctl.h>
       #include <unistd.h>
       #include <endian.h>
       #include <linux/usb/ch9.h>
       #include <linux/usbdevice_fs.h>
      
       static struct usbdevfs_urb urb;
       static struct usbdevfs_disconnectsignal ds;
       static volatile sig_atomic_t done = 0;
      
       void urb_handler(int sig, siginfo_t *info , void *ucontext)
       {
       	printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
       	       sig, info->si_signo, info->si_errno, info->si_code,
       	       info->si_addr, &urb);
      
       	printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
       }
      
       void ds_handler(int sig, siginfo_t *info , void *ucontext)
       {
       	printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
       	       sig, info->si_signo, info->si_errno, info->si_code,
       	       info->si_addr, &ds);
      
       	printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
       	done = 1;
       }
      
       int main(int argc, char **argv)
       {
       	char *devfilename;
       	int fd;
       	int rc;
       	struct sigaction act;
       	struct usb_ctrlrequest *req;
       	void *ptr;
       	char buf[80];
      
       	if (argc != 2) {
       		fprintf(stderr, "Usage: usbsig device-file-name\n");
       		return 1;
       	}
      
       	devfilename = argv[1];
       	fd = open(devfilename, O_RDWR);
       	if (fd == -1) {
       		perror("Error opening device file");
       		return 1;
       	}
      
       	act.sa_sigaction = urb_handler;
       	sigemptyset(&act.sa_mask);
       	act.sa_flags = SA_SIGINFO;
      
       	rc = sigaction(SIGUSR1, &act, NULL);
       	if (rc == -1) {
       		perror("Error in sigaction");
       		return 1;
       	}
      
       	act.sa_sigaction = ds_handler;
       	sigemptyset(&act.sa_mask);
       	act.sa_flags = SA_SIGINFO;
      
       	rc = sigaction(SIGUSR2, &act, NULL);
       	if (rc == -1) {
       		perror("Error in sigaction");
       		return 1;
       	}
      
       	memset(&urb, 0, sizeof(urb));
       	urb.type = USBDEVFS_URB_TYPE_CONTROL;
       	urb.endpoint = USB_DIR_IN | 0;
       	urb.buffer = buf;
       	urb.buffer_length = sizeof(buf);
       	urb.signr = SIGUSR1;
      
       	req = (struct usb_ctrlrequest *) buf;
       	req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
       	req->bRequest = USB_REQ_GET_DESCRIPTOR;
       	req->wValue = htole16(USB_DT_DEVICE << 8);
       	req->wIndex = htole16(0);
       	req->wLength = htole16(sizeof(buf) - sizeof(*req));
      
       	rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
       	if (rc == -1) {
       		perror("Error in SUBMITURB ioctl");
       		return 1;
       	}
      
       	rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
       	if (rc == -1) {
       		perror("Error in REAPURB ioctl");
       		return 1;
       	}
      
       	memset(&ds, 0, sizeof(ds));
       	ds.signr = SIGUSR2;
       	ds.context = &ds;
       	rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
       	if (rc == -1) {
       		perror("Error in DISCSIGNAL ioctl");
       		return 1;
       	}
      
       	printf("Waiting for usb disconnect\n");
       	while (!done) {
       		sleep(1);
       	}
      
       	close(fd);
       	return 0;
       }
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-usb@vger.kernel.org
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Oliver Neukum <oneukum@suse.com>
      Fixes: v2.3.39
      Cc: stable@vger.kernel.org
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      70f1b0d3
  5. 15 5月, 2019 1 次提交
  6. 30 3月, 2019 1 次提交
  7. 04 2月, 2019 2 次提交
    • E
      sched/core: Convert signal_struct.sigcnt to refcount_t · 60d4de3f
      Elena Reshetova 提交于
      atomic_t variables are currently used to implement reference
      counters with the following properties:
      
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable signal_struct.sigcnt is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      ** Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
      in state to be merged to the documentation tree.
      
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the signal_struct.sigcnt it might make a difference
      in following places:
      
       - put_signal_struct(): decrement in refcount_dec_and_test() only
         provides RELEASE ordering and control dependency on success
         vs. fully ordered atomic counterpart
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: viro@zeniv.linux.org.uk
      Link: https://lkml.kernel.org/r/1547814450-18902-3-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60d4de3f
    • E
      sched/core: Convert sighand_struct.count to refcount_t · d036bda7
      Elena Reshetova 提交于
      atomic_t variables are currently used to implement reference
      counters with the following properties:
      
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable sighand_struct.count is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      
      ** Important note for maintainers:
      
      Some functions from refcount_t API defined in lib/refcount.c
      have different memory ordering guarantees than their atomic
      counterparts.
      
      The full comparison can be seen in
      https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
      in state to be merged to the documentation tree.
      
      Normally the differences should not matter since refcount_t provides
      enough guarantees to satisfy the refcounting use cases, but in
      some rare cases it might matter.
      
      Please double check that you don't have some undocumented
      memory guarantees for this variable usage.
      
      For the sighand_struct.count it might make a difference
      in following places:
      
       - __cleanup_sighand: decrement in refcount_dec_and_test() only
         provides RELEASE ordering and control dependency on success
         vs. fully ordered atomic counterpart
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: viro@zeniv.linux.org.uk
      Link: https://lkml.kernel.org/r/1547814450-18902-2-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d036bda7
  8. 03 10月, 2018 1 次提交
    • E
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman 提交于
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
  9. 12 9月, 2018 2 次提交
  10. 23 8月, 2018 1 次提交
  11. 10 8月, 2018 1 次提交
    • E
      signal: Don't restart fork when signals come in. · c3ad2c3b
      Eric W. Biederman 提交于
      Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
      report that a periodic signal received during fork can cause fork to
      continually restart preventing an application from making progress.
      
      The code was being overly pessimistic.  Fork needs to guarantee that a
      signal sent to multiple processes is logically delivered before the
      fork and just to the forking process or logically delivered after the
      fork to both the forking process and it's newly spawned child.  For
      signals like periodic timers that are always delivered to a single
      process fork can safely complete and let them appear to logically
      delivered after the fork().
      
      While examining this issue I also discovered that fork today will miss
      signals delivered to multiple processes during the fork and handled by
      another thread.  Similarly the current code will also miss blocked
      signals that are delivered to multiple process, as those signals will
      not appear pending during fork.
      
      Add a list of each thread that is currently forking, and keep on that
      list a signal set that records all of the signals sent to multiple
      processes.  When fork completes initialize the new processes
      shared_pending signal set with it.  The calculate_sigpending function
      will see those signals and set TIF_SIGPENDING causing the new task to
      take the slow path to userspace to handle those signals.  Making it
      appear as if those signals were received immediately after the fork.
      
      It is not possible to send real time signals to multiple processes and
      exceptions don't go to multiple processes, which means that that are
      no signals sent to multiple processes that require siginfo.  This
      means it is safe to not bother collecting siginfo on signals sent
      during fork.
      
      The sigaction of a child of fork is initially the same as the
      sigaction of the parent process.  So a signal the parent ignores the
      child will also initially ignore.  Therefore it is safe to ignore
      signals sent to multiple processes and ignored by the forking process.
      
      Signals sent to only a single process or only a single thread and delivered
      during fork are treated as if they are received after the fork, and generally
      not dealt with.  They won't cause any problems.
      
      V2: Added removal from the multiprocess list on failure.
      V3: Use -ERESTARTNOINTR directly
      V4: - Don't queue both SIGCONT and SIGSTOP
          - Initialize signal_struct.multiprocess in init_task
          - Move setting of shared_pending to before the new task
            is visible to signals.  This prevents signals from comming
            in before shared_pending.signal is set to delayed.signal
            and being lost.
      V5: - rework list add and delete to account for idle threads
      v6: - Use sigdelsetmask when removing stop signals
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
      Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
      Reported-by: Nmajiang <ma.jiang@zte.com.cn>
      Fixes: 4a2c7a78 ("[PATCH] make fork() atomic wrt pgrp/session signals")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c3ad2c3b
  12. 04 8月, 2018 2 次提交
    • E
      fork: Have new threads join on-going signal group stops · 924de3b8
      Eric W. Biederman 提交于
      There are only two signals that are delivered to every member of a
      signal group: SIGSTOP and SIGKILL.  Signal delivery requires every
      signal appear to be delivered either before or after a clone syscall.
      SIGKILL terminates the clone so does not need to be considered.  Which
      leaves only SIGSTOP that needs to be considered when creating new
      threads.
      
      Today in the event of a group stop TIF_SIGPENDING will get set and the
      fork will restart ensuring the fork syscall participates in the group
      stop.
      
      A fork (especially of a process with a lot of memory) is one of the
      most expensive system so we really only want to restart a fork when
      necessary.
      
      It is easy so check to see if a SIGSTOP is ongoing and have the new
      thread join it immediate after the clone completes.  Making it appear
      the clone completed happened just before the SIGSTOP.
      
      The calculate_sigpending function will see the bits set in jobctl and
      set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
      
      V2: The call to task_join_group_stop was moved before the new task is
          added to the thread group list.  This should not matter as
          sighand->siglock is held over both the addition of the threads,
          the call to task_join_group_stop and do_signal_stop.  But the change
          is trivial and it is one less thing to worry about when reading
          the code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      924de3b8
    • E
      signal: Add calculate_sigpending() · 088fe47c
      Eric W. Biederman 提交于
      Add a function calculate_sigpending to test to see if any signals are
      pending for a new task immediately following fork.  Signals have to
      happen either before or after fork.  Today our practice is to push
      all of the signals to before the fork, but that has the downside that
      frequent or periodic signals can make fork take much much longer than
      normal or prevent fork from completing entirely.
      
      So we need move signals that we can after the fork to prevent that.
      
      This updates the code to set TIF_SIGPENDING on a new task if there
      are signals or other activities that have moved so that they appear
      to happen after the fork.
      
      As the code today restarts if it sees any such activity this won't
      immediately have an effect, as there will be no reason for it
      to set TIF_SIGPENDING immediately after the fork.
      
      Adding calculate_sigpending means the code in fork can safely be
      changed to not always restart if a signal is pending.
      
      The new calculate_sigpending function sets sigpending if there
      are pending bits in jobctl, pending signals, the freezer needs
      to freeze the new task or the live kernel patching framework
      need the new thread to take the slow path to userspace.
      
      I have verified that setting TIF_SIGPENDING does make a new process
      take the slow path to userspace before it executes it's first userspace
      instruction.
      
      I have looked at the callers of signal_wake_up and the code paths
      setting TIF_SIGPENDING and I don't see anything else that needs to be
      handled.  The code probably doesn't need to set TIF_SIGPENDING for the
      kernel live patching as it uses a separate thread flag as well.  But
      at this point it seems safer reuse the recalc_sigpending logic and get
      the kernel live patching folks to sort out their story later.
      
      V2: I have moved the test into schedule_tail where siglock can
          be grabbed and recalc_sigpending can be reused directly.
          Further as the last action of setting up a new task this
          guarantees that TIF_SIGPENDING will be properly set in the
          new process.
      
          The helper calculate_sigpending takes the siglock and
          uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
          clear TIF_SIGPENDING if it is unnecessary.  This allows reusing
          the existing code and keeps maintenance of the conditions simple.
      
          Oleg Nesterov <oleg@redhat.com>  suggested the movement
          and pointed out the need to take siglock if this code
          was going to be called while the new task is discoverable.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      088fe47c
  13. 21 7月, 2018 5 次提交
    • E
      signal: Pass pid and pid type into send_sigqueue · 24122c7f
      Eric W. Biederman 提交于
      Make the code more maintainable by performing more of the signal
      related work in send_sigqueue.
      
      A quick inspection of do_timer_create will show that this code path
      does not lookup a thread group by a thread's pid.  Making it safe
      to find the task pointed to by it_pid with "pid_task(it_pid, type)";
      
      This supports the changes needed in fork to tell if a signal was sent
      to a single process or a group of processes.
      
      Having the pid to task transition in signal.c will also make it easier
      to sort out races with de_thread and and the thread group leader
      exiting when it comes time to address that.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      24122c7f
    • E
      pid: Implement PIDTYPE_TGID · 6883f81a
      Eric W. Biederman 提交于
      Everywhere except in the pid array we distinguish between a tasks pid and
      a tasks tgid (thread group id).  Even in the enumeration we want that
      distinction sometimes so we have added __PIDTYPE_TGID.  With leader_pid
      we almost have an implementation of PIDTYPE_TGID in struct signal_struct.
      
      Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
      into the pids array.  Then remove the __PIDTYPE_TGID special case and the
      leader_pid in signal_struct.
      
      The net size increase is just an extra pointer added to struct pid and
      an extra pair of pointers of an hlist_node added to task_struct.
      
      The effect on code maintenance is the removal of a number of special
      cases today and the potential to remove many more special cases as
      PIDTYPE_TGID gets used to it's fullest.  The long term potential
      is allowing zombie thread group leaders to exit, which will remove
      a lot more special cases in the code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      6883f81a
    • E
      pids: Move the pgrp and session pid pointers from task_struct to signal_struct · 2c470475
      Eric W. Biederman 提交于
      To access these fields the code always has to go to group leader so
      going to signal struct is no loss and is actually a fundamental simplification.
      
      This saves a little bit of memory by only allocating the pid pointer array
      once instead of once for every thread, and even better this removes a
      few potential races caused by the fact that group_leader can be changed
      by de_thread, while signal_struct can not.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2c470475
    • E
      pids: Compute task_tgid using signal->leader_pid · 7a36094d
      Eric W. Biederman 提交于
      The cost is the the same and this removes the need
      to worry about complications that come from de_thread
      and group_leader changing.
      
      __task_pid_nr_ns has been updated to take advantage of this change.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7a36094d
    • E
      pids: Move task_pid_type into sched/signal.h · 1fb53567
      Eric W. Biederman 提交于
      The function is general and inline so there is no need
      to hide it inside of exit.c
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      1fb53567
  14. 04 5月, 2018 1 次提交
    • P
      sched/core: Introduce set_special_state() · b5bf9a90
      Peter Zijlstra 提交于
      Gaurav reported a perceived problem with TASK_PARKED, which turned out
      to be a broken wait-loop pattern in __kthread_parkme(), but the
      reported issue can (and does) in fact happen for states that do not do
      condition based sleeps.
      
      When the 'current->state = TASK_RUNNING' store of a previous
      (concurrent) try_to_wake_up() collides with the setting of a 'special'
      sleep state, we can loose the sleep state.
      
      Normal condition based wait-loops are immune to this problem, but for
      sleep states that are not condition based are subject to this problem.
      
      There already is a fix for TASK_DEAD. Abstract that and also apply it
      to TASK_STOPPED and TASK_TRACED, both of which are also without
      condition based wait-loop.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b5bf9a90
  15. 07 3月, 2018 1 次提交
  16. 23 1月, 2018 3 次提交
    • E
      signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed · f71dd7dc
      Eric W. Biederman 提交于
      There are so many places that build struct siginfo by hand that at
      least one of them is bound to get it wrong.  A handful of cases in the
      kernel arguably did just that when using the errno field of siginfo to
      pass no errno values to userspace.  The usage is limited to a single
      si_code so at least does not mess up anything else.
      
      Encapsulate this questionable pattern in a helper function so
      that the userspace ABI is preserved.
      
      Update all of the places that use this pattern to use the new helper
      function.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f71dd7dc
    • E
      signal: Helpers for faults with specialized siginfo layouts · 38246735
      Eric W. Biederman 提交于
      The helpers added are:
      send_sig_mceerr
      force_sig_mceerr
      force_sig_bnderr
      force_sig_pkuerr
      
      Filling out siginfo properly can ge tricky.  Especially for these
      specialized cases where the temptation is to share code with other
      cases which use a different subset of siginfo fields.  Unfortunately
      that code sharing frequently results in bugs with the wrong siginfo
      fields filled in, and makes it harder to verify that the siginfo
      structure was properly initialized.
      
      Provide these helpers instead that get all of the details right, and
      guarantee that siginfo is properly initialized.
      
      send_sig_mceerr and force_sig_mceer are a little special as two si
      codes BUS_MCEERR_AO and BUS_MCEER_AR both use the same extended
      signinfo layout.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      38246735
    • E
      signal: Add send_sig_fault and force_sig_fault · f8ec6601
      Eric W. Biederman 提交于
      The vast majority of signals sent from architecture specific code are
      simple faults.  Encapsulate this reality with two helper functions so
      that the nit-picky implementation of preparing a siginfo does not need
      to be repeated many times on each architecture.
      
      As only some architectures support the trapno field, make the trapno
      arguement only present on those architectures.
      
      Similary as ia64 has three fields: imm, flags, and isr that
      are specific to it.  Have those arguments always present on ia64
      and no where else.
      
      This ensures the architecture specific code always remembers which
      fields it needs to pass into the siginfo structure.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f8ec6601
  17. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  18. 01 7月, 2017 1 次提交
    • K
      randstruct: Mark various structs for randomization · 3859a271
      Kees Cook 提交于
      This marks many critical kernel structures for randomization. These are
      structures that have been targeted in the past in security exploits, or
      contain functions pointers, pointers to function pointer tables, lists,
      workqueues, ref-counters, credentials, permissions, or are otherwise
      sensitive. This initial list was extracted from Brad Spengler/PaX Team's
      code in the last public patch of grsecurity/PaX based on my understanding
      of the code. Changes or omissions from the original code are mine and
      don't reflect the original grsecurity/PaX code.
      
      Left out of this list is task_struct, which requires special handling
      and will be covered in a subsequent patch.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      3859a271
  19. 22 4月, 2017 1 次提交
  20. 03 3月, 2017 4 次提交
    • I
      sched/headers: Move cputime functionality from <linux/sched.h> and... · 1050b27c
      Ingo Molnar 提交于
      sched/headers: Move cputime functionality from <linux/sched.h> and <linux/cputime.h> into <linux/sched/cputime.h>
      
      Move cputime related functionality out of <linux/sched.h>, as most code
      that includes <linux/sched.h> does not use that functionality.
      
      Move data types that are not included in task_struct directly to
      the signal definitions, into <linux/sched/signal.h>.
      
      Also merge the (small) existing <linux/cputime.h> header into <linux/sched/cputime.h>.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1050b27c
    • I
      sched/headers: Move signal wakeup & sigpending methods from <linux/sched.h>... · 2a1f062a
      Ingo Molnar 提交于
      sched/headers: Move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
      
      This reduces the size of <linux/sched.h>.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2a1f062a
    • I
      sched/headers: Move 'struct pacct_struct' and 'struct cpu_itimer' form... · 8d88460e
      Ingo Molnar 提交于
      sched/headers: Move 'struct pacct_struct' and 'struct cpu_itimer' form <linux/sched.h> to <linux/sched/signal.h>
      
      These structures are actually part of 'struct signal', so move them to <linux/sched/signal.h>
      where they belong.
      
      This further decreases the size and complexity of <linux/sched.h>.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8d88460e
    • I
      sched/headers: Move task_struct::signal and task_struct::sighand types and... · c3edc401
      Ingo Molnar 提交于
      sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>
      
      task_struct::signal and task_struct::sighand are pointers, which would normally make it
      straightforward to not define those types in sched.h.
      
      That is not so, because the types are accompanied by a myriad of APIs (macros and inline
      functions) that dereference them.
      
      Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.
      
      With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
      trying to put accessors into sched.h as a test fails the following way:
      
        ./include/linux/sched.h: In function ‘test_signal_types’:
        ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
                          ^
      
      This reduces the size and complexity of sched.h significantly.
      
      Update all headers and .c code that relied on getting the signal handling
      functionality from <linux/sched.h> to include <linux/sched/signal.h>.
      
      The list of affected files in the preparatory patch was partly generated by
      grepping for the APIs, and partly by doing coverage build testing, both
      all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
      cross-architecture builds.
      
      Nevertheless some (trivial) build breakage is still expected related to rare
      Kconfig combinations and in-flight patches to various kernel code, but most
      of it should be handled by this patch.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c3edc401
  21. 02 3月, 2017 3 次提交