1. 26 10月, 2021 1 次提交
    • T
      signal: Add an optional check for altstack size · 1bdda24c
      Thomas Gleixner 提交于
      New x86 FPU features will be very large, requiring ~10k of stack in
      signal handlers.  These new features require a new approach called
      "dynamic features".
      
      The kernel currently tries to ensure that altstacks are reasonably
      sized. Right now, on x86, sys_sigaltstack() requires a size of >=2k.
      However, that 2k is a constant. Simply raising that 2k requirement
      to >10k for the new features would break existing apps which have a
      compiled-in size of 2k.
      
      Instead of universally enforcing a larger stack, prohibit a process from
      using dynamic features without properly-sized altstacks. This must be
      enforced in two places:
      
       * A dynamic feature can not be enabled without an large-enough altstack
         for each process thread.
       * Once a dynamic feature is enabled, any request to install a too-small
         altstack will be rejected
      
      The dynamic feature enabling code must examine each thread in a
      process to ensure that the altstacks are large enough. Add a new lock
      (sigaltstack_lock()) to ensure that threads can not race and change
      their altstack after being examined.
      
      Add the infrastructure in form of a config option and provide empty
      stubs for architectures which do not need dynamic altstack size checks.
      
      This implementation will be fleshed out for x86 in a future patch called
      
        x86/arch_prctl: Add controls for dynamic XSTATE components
      
        [dhansen: commit message. ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20211021225527.10184-2-chang.seok.bae@intel.com
      1bdda24c
  2. 24 7月, 2021 1 次提交
  3. 02 7月, 2021 1 次提交
  4. 28 6月, 2021 1 次提交
    • L
      Revert "signal: Allow tasks to cache one sigqueue struct" · b4b27b9e
      Linus Torvalds 提交于
      This reverts commits 4bad58eb (and
      399f8dd9, which tried to fix it).
      
      I do not believe these are correct, and I'm about to release 5.13, so am
      reverting them out of an abundance of caution.
      
      The locking is odd, and appears broken.
      
      On the allocation side (in __sigqueue_alloc()), the locking is somewhat
      straightforward: it depends on sighand->siglock.  Since one caller
      doesn't hold that lock, it further then tests 'sigqueue_flags' to avoid
      the case with no locks held.
      
      On the freeing side (in sigqueue_cache_or_free()), there is no locking
      at all, and the logic instead depends on 'current' being a single
      thread, and not able to race with itself.
      
      To make things more exciting, there's also the data race between freeing
      a signal and allocating one, which is handled by using WRITE_ONCE() and
      READ_ONCE(), and being mutually exclusive wrt the initial state (ie
      freeing will only free if the old state was NULL, while allocating will
      obviously only use the value if it was non-NULL, so only one or the
      other will actually act on the value).
      
      However, while the free->alloc paths do seem mutually exclusive thanks
      to just the data value dependency, it's not clear what the memory
      ordering constraints are on it.  Could writes from the previous
      allocation possibly be delayed and seen by the new allocation later,
      causing logical inconsistencies?
      
      So it's all very exciting and unusual.
      
      And in particular, it seems that the freeing side is incorrect in
      depending on "current" being single-threaded.  Yes, 'current' is a
      single thread, but in the presense of asynchronous events even a single
      thread can have data races.
      
      And such asynchronous events can and do happen, with interrupts causing
      signals to be flushed and thus free'd (for example - sending a
      SIGCONT/SIGSTOP can happen from interrupt context, and can flush
      previously queued process control signals).
      
      So regardless of all the other questions about the memory ordering and
      locking for this new cached allocation, the sigqueue_cache_or_free()
      assumptions seem to be fundamentally incorrect.
      
      It may be that people will show me the errors of my ways, and tell me
      why this is all safe after all.  We can reinstate it if so.  But my
      current belief is that the WRITE_ONCE() that sets the cached entry needs
      to be a smp_store_release(), and the READ_ONCE() that finds a cached
      entry needs to be a smp_load_acquire() to handle memory ordering
      correctly.
      
      And the sequence in sigqueue_cache_or_free() would need to either use a
      lock or at least be interrupt-safe some way (perhaps by using something
      like the percpu 'cmpxchg': it doesn't need to be SMP-safe, but like the
      percpu operations it needs to be interrupt-safe).
      
      Fixes: 399f8dd9 ("signal: Prevent sigqueue caching after task got released")
      Fixes: 4bad58eb ("signal: Allow tasks to cache one sigqueue struct")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4b27b9e
  5. 19 5月, 2021 1 次提交
  6. 16 4月, 2021 1 次提交
  7. 15 4月, 2021 1 次提交
    • T
      signal: Allow tasks to cache one sigqueue struct · 4bad58eb
      Thomas Gleixner 提交于
      The idea for this originates from the real time tree to make signal
      delivery for realtime applications more efficient. In quite some of these
      application scenarios a control tasks signals workers to start their
      computations. There is usually only one signal per worker on flight.  This
      works nicely as long as the kmem cache allocations do not hit the slow path
      and cause latencies.
      
      To cure this an optimistic caching was introduced (limited to RT tasks)
      which allows a task to cache a single sigqueue in a pointer in task_struct
      instead of handing it back to the kmem cache after consuming a signal. When
      the next signal is sent to the task then the cached sigqueue is used
      instead of allocating a new one. This solved the problem for this set of
      application scenarios nicely.
      
      The task cache is not preallocated so the first signal sent to a task goes
      always to the cache allocator. The cached sigqueue stays around until the
      task exits and is freed when task::sighand is dropped.
      
      After posting this solution for mainline the discussion came up whether
      this would be useful in general and should not be limited to realtime
      tasks: https://lore.kernel.org/r/m11rcu7nbr.fsf@fess.ebiederm.org
      
      One concern leading to the original limitation was to avoid a large amount
      of pointlessly cached sigqueues in alive tasks. The other concern was
      vs. RLIMIT_SIGPENDING as these cached sigqueues are not accounted for.
      
      The accounting problem is real, but on the other hand slightly academic.
      After gathering some statistics it turned out that after boot of a regular
      distro install there are less than 10 sigqueues cached in ~1500 tasks.
      
      In case of a 'mass fork and fire signal to child' scenario the extra 80
      bytes of memory per task are well in the noise of the overall memory
      consumption of the fork bomb.
      
      If this should be limited then this would need an extra counter in struct
      user, more atomic instructions and a seperate rlimit. Yet another tunable
      which is mostly unused.
      
      The caching is actually used. After boot and a full kernel compile on a
      64CPU machine with make -j128 the number of 'allocations' looks like this:
      
        From slab:	   23996
        From task cache: 52223
      
      I.e. it reduces the number of slab cache operations by ~68%.
      
      A typical pattern there is:
      
      <...>-58490 __sigqueue_alloc:  for 58488 from slab ffff8881132df460
      <...>-58488 __sigqueue_free:   cache ffff8881132df460
      <...>-58488 __sigqueue_alloc:  for 1149 from cache ffff8881103dc550
        bash-1149 exit_task_sighand: free ffff8881132df460
        bash-1149 __sigqueue_free:   cache ffff8881103dc550
      
      The interesting sequence is that the exiting task 58488 grabs the sigqueue
      from bash's task cache to signal exit and bash sticks it back into it's own
      cache. Lather, rinse and repeat.
      
      The caching is probably not noticable for the general use case, but the
      benefit for latency sensitive applications is clear. While kmem caches are
      usually just serving from the fast path the slab merging (default) can
      depending on the usage pattern of the merged slabs cause occasional slow
      path allocations.
      
      The time spared per cached entry is a few micro seconds per signal which is
      not relevant for e.g. a kernel build, but for signal heavy workloads it's
      measurable.
      
      As there is no real downside of this caching mechanism making it
      unconditionally available is preferred over more conditional code or new
      magic tunables.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Link: https://lkml.kernel.org/r/87sg4lbmxo.fsf@nanos.tec.linutronix.de
      4bad58eb
  8. 24 11月, 2020 1 次提交
  9. 30 10月, 2020 1 次提交
    • G
      include: jhash/signal: Fix fall-through warnings for Clang · 4169e889
      Gustavo A. R. Silva 提交于
      In preparation to enable -Wimplicit-fallthrough for Clang, explicitly
      add break statements instead of letting the code fall through to the
      next case.
      
      This patch adds four break statements that, together, fix almost 40,000
      warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change
      reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang,
      such change[1] is meant to be reverted at some point. So, this patch helps
      to move in that direction.
      
      Something important to mention is that there is currently a discrepancy
      between GCC and Clang when dealing with switch fall-through to empty case
      statements or to cases that only contain a break/continue/return
      statement[2][3][4].
      
      Now that the -Wimplicit-fallthrough option has been globally enabled[5],
      any compiler should really warn on missing either a fallthrough annotation
      or any of the other case-terminating statements (break/continue/return/
      goto) when falling through to the next case statement. Making exceptions
      to this introduces variation in case handling which may continue to lead
      to bugs, misunderstandings, and a general lack of robustness. The point
      of enabling options like -Wimplicit-fallthrough is to prevent human error
      and aid developers in spotting bugs before their code is even built/
      submitted/committed, therefore eliminating classes of bugs. So, in order
      to really accomplish this, we should, and can, move in the direction of
      addressing any error-prone scenarios and get rid of the unintentional
      fallthrough bug-class in the kernel, entirely, even if there is some minor
      redundancy. Better to have explicit case-ending statements than continue to
      have exceptions where one must guess as to the right result. The compiler
      will eliminate any actual redundancy.
      
      [1] commit e2079e93 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
      [2] https://github.com/ClangBuiltLinux/linux/issues/636
      [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432
      [4] https://godbolt.org/z/xgkvIh
      [5] commit a035d552 ("Makefile: Globally enable fall-through warning")
      Co-developed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      4169e889
  10. 24 8月, 2020 1 次提交
  11. 06 5月, 2020 1 次提交
    • E
      binfmt_elf: remove the set_fs in fill_siginfo_note · fa4751f4
      Eric W. Biederman 提交于
      The code in binfmt_elf.c is differnt from the rest of the code that
      processes siginfo, as it sends siginfo from a kernel buffer to a file
      rather than from kernel memory to userspace buffers.  To remove it's
      use of set_fs the code needs some different siginfo helpers.
      
      Add the helper copy_siginfo_to_external to copy from the kernel's
      internal siginfo layout to a buffer in the siginfo layout that
      userspace expects.
      
      Modify fill_siginfo_note to use copy_siginfo_to_external instead of
      set_fs and copy_siginfo_to_user.
      
      Update compat_binfmt_elf.c to use the previously added
      copy_siginfo_to_external32 to handle the compat case.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fa4751f4
  12. 27 3月, 2020 1 次提交
  13. 19 8月, 2019 1 次提交
    • E
      signal: Allow cifs and drbd to receive their terminating signals · 33da8e7c
      Eric W. Biederman 提交于
      My recent to change to only use force_sig for a synchronous events
      wound up breaking signal reception cifs and drbd.  I had overlooked
      the fact that by default kthreads start out with all signals set to
      SIG_IGN.  So a change I thought was safe turned out to have made it
      impossible for those kernel thread to catch their signals.
      
      Reverting the work on force_sig is a bad idea because what the code
      was doing was very much a misuse of force_sig.  As the way force_sig
      ultimately allowed the signal to happen was to change the signal
      handler to SIG_DFL.  Which after the first signal will allow userspace
      to send signals to these kernel threads.  At least for
      wake_ack_receiver in drbd that does not appear actively wrong.
      
      So correct this problem by adding allow_kernel_signal that will allow
      signals whose siginfo reports they were sent by the kernel through,
      but will not allow userspace generated signals, and update cifs and
      drbd to call allow_kernel_signal in an appropriate place so that their
      thread can receive this signal.
      
      Fixing things this way ensures that userspace won't be able to send
      signals and cause problems, that it is clear which signals the
      threads are expecting to receive, and it guarantees that nothing
      else in the system will be affected.
      
      This change was partly inspired by similar cifs and drbd patches that
      added allow_signal.
      Reported-by: Nronnie sahlberg <ronniesahlberg@gmail.com>
      Reported-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
      Tested-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
      Cc: Steve French <smfrench@gmail.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Fixes: 247bc947 ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
      Fixes: 72abe3bc ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
      Fixes: fee10990 ("signal/drbd: Use send_sig not force_sig")
      Fixes: 3cf5d076 ("signal: Remove task parameter from force_sig")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      33da8e7c
  14. 17 7月, 2019 1 次提交
  15. 29 6月, 2019 1 次提交
    • O
      signal: remove the wrong signal_pending() check in restore_user_sigmask() · 97abc889
      Oleg Nesterov 提交于
      This is the minimal fix for stable, I'll send cleanups later.
      
      Commit 854a6ed5 ("signal: Add restore_user_sigmask()") introduced
      the visible change which breaks user-space: a signal temporary unblocked
      by set_user_sigmask() can be delivered even if the caller returns
      success or timeout.
      
      Change restore_user_sigmask() to accept the additional "interrupted"
      argument which should be used instead of signal_pending() check, and
      update the callers.
      
      Eric said:
      
      : For clarity.  I don't think this is required by posix, or fundamentally to
      : remove the races in select.  It is what linux has always done and we have
      : applications who care so I agree this fix is needed.
      :
      : Further in any case where the semantic change that this patch rolls back
      : (aka where allowing a signal to be delivered and the select like call to
      : complete) would be advantage we can do as well if not better by using
      : signalfd.
      :
      : Michael is there any chance we can get this guarantee of the linux
      : implementation of pselect and friends clearly documented.  The guarantee
      : that if the system call completes successfully we are guaranteed that no
      : signal that is unblocked by using sigmask will be delivered?
      
      Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
      Fixes: 854a6ed5 ("signal: Add restore_user_sigmask()")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NEric Wong <e@80x24.org>
      Tested-by: NEric Wong <e@80x24.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97abc889
  16. 13 1月, 2019 1 次提交
    • E
      signal: Make siginmask safe when passed a signal of 0 · ee17e5d6
      Eric W. Biederman 提交于
      Eric Biggers reported:
      > The following commit, which went into v4.20, introduced undefined behavior when
      > sys_rt_sigqueueinfo() is called with sig=0:
      >
      > commit 4ce5f9c9
      > Author: Eric W. Biederman <ebiederm@xmission.com>
      > Date:   Tue Sep 25 12:59:31 2018 +0200
      >
      >     signal: Use a smaller struct siginfo in the kernel
      >
      > In sig_specific_sicodes(), used from known_siginfo_layout(), the expression
      > '1ULL << ((sig)-1)' is undefined as it evaluates to 1ULL << 4294967295.
      >
      > Reproducer:
      >
      > #include <signal.h>
      > #include <sys/syscall.h>
      > #include <unistd.h>
      >
      > int main(void)
      > {
      > 	siginfo_t si = { .si_code = 1 };
      > 	syscall(__NR_rt_sigqueueinfo, 0, 0, &si);
      > }
      >
      > UBSAN report for v5.0-rc1:
      >
      > UBSAN: Undefined behaviour in kernel/signal.c:2946:7
      > shift exponent 4294967295 is too large for 64-bit type 'long unsigned int'
      > CPU: 2 PID: 346 Comm: syz_signal Not tainted 5.0.0-rc1 #25
      > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      > Call Trace:
      >  __dump_stack lib/dump_stack.c:77 [inline]
      >  dump_stack+0x70/0xa5 lib/dump_stack.c:113
      >  ubsan_epilogue+0xd/0x40 lib/ubsan.c:159
      >  __ubsan_handle_shift_out_of_bounds+0x12c/0x170 lib/ubsan.c:425
      >  known_siginfo_layout+0xae/0xe0 kernel/signal.c:2946
      >  post_copy_siginfo_from_user kernel/signal.c:3009 [inline]
      >  __copy_siginfo_from_user+0x35/0x60 kernel/signal.c:3035
      >  __do_sys_rt_sigqueueinfo kernel/signal.c:3553 [inline]
      >  __se_sys_rt_sigqueueinfo kernel/signal.c:3549 [inline]
      >  __x64_sys_rt_sigqueueinfo+0x31/0x70 kernel/signal.c:3549
      >  do_syscall_64+0x4c/0x1b0 arch/x86/entry/common.c:290
      >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      > RIP: 0033:0x433639
      > Code: c4 18 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 27 00 00 c3 66 2e 0f 1f 84 00 00 00 00
      > RSP: 002b:00007fffcb289fc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000081
      > RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000433639
      > RDX: 00007fffcb289fd0 RSI: 0000000000000000 RDI: 0000000000000000
      > RBP: 00000000006b2018 R08: 000000000000004d R09: 0000000000000000
      > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401560
      > R13: 00000000004015f0 R14: 0000000000000000 R15: 0000000000000000
      
      I have looked at the other callers of siginmask and they all appear to
      in locations where sig can not be zero.
      
      I have looked at the code generation of adding an extra test against
      zero and gcc was able with a simple decrement instruction to combine
      the two tests together. So the at most adding this test cost a single
      cpu cycle.  In practice that decrement instruction was already present
      as part of the mask comparison, so the only change was when the
      instruction was executed.
      
      So given that it is cheap, and obviously correct to update siginmask
      to verify the signal is not zero.  Fix this issue there to avoid any
      future problems.
      Reported-by: NEric Biggers <ebiggers@kernel.org>
      Fixes: 4ce5f9c9 ("signal: Use a smaller struct siginfo in the kernel")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ee17e5d6
  17. 07 12月, 2018 2 次提交
    • D
      signal: Add restore_user_sigmask() · 854a6ed5
      Deepa Dinamani 提交于
      Refactor the logic to restore the sigmask before the syscall
      returns into an api.
      This is useful for versions of syscalls that pass in the
      sigmask and expect the current->sigmask to be changed during
      the execution and restored after the execution of the syscall.
      
      With the advent of new y2038 syscalls in the subsequent patches,
      we add two more new versions of the syscalls (for pselect, ppoll
      and io_pgetevents) in addition to the existing native and compat
      versions. Adding such an api reduces the logic that would need to
      be replicated otherwise.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      854a6ed5
    • D
      signal: Add set_user_sigmask() · ded653cc
      Deepa Dinamani 提交于
      Refactor reading sigset from userspace and updating sigmask
      into an api.
      
      This is useful for versions of syscalls that pass in the
      sigmask and expect the current->sigmask to be changed during,
      and restored after, the execution of the syscall.
      
      With the advent of new y2038 syscalls in the subsequent patches,
      we add two more new versions of the syscalls (for pselect, ppoll,
      and io_pgetevents) in addition to the existing native and compat
      versions. Adding such an api reduces the logic that would need to
      be replicated otherwise.
      
      Note that the calls to sigprocmask() ignored the return value
      from the api as the function only returns an error on an invalid
      first argument that is hardcoded at these call sites.
      The updated logic uses set_current_blocked() instead.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      ded653cc
  18. 31 10月, 2018 1 次提交
  19. 11 10月, 2018 1 次提交
    • E
      signal: Guard against negative signal numbers in copy_siginfo_from_user32 · a3670058
      Eric W. Biederman 提交于
      While fixing an out of bounds array access in known_siginfo_layout
      reported by the kernel test robot it became apparent that the same bug
      exists in siginfo_layout and affects copy_siginfo_from_user32.
      
      The straight forward fix that makes guards against making this mistake
      in the future and should keep the code size small is to just take an
      unsigned signal number instead of a signed signal number, as I did to
      fix known_siginfo_layout.
      
      Cc: stable@vger.kernel.org
      Fixes: cc731525 ("signal: Remove kernel interal si_code magic")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      a3670058
  20. 03 10月, 2018 3 次提交
    • E
      signal: Use a smaller struct siginfo in the kernel · 4ce5f9c9
      Eric W. Biederman 提交于
      We reserve 128 bytes for struct siginfo but only use about 48 bytes on
      64bit and 32 bytes on 32bit.  Someday we might use more but it is unlikely
      to be anytime soon.
      
      Userspace seems content with just enough bytes of siginfo to implement
      sigqueue.  Or in the case of checkpoint/restart reinjecting signals
      the kernel has sent.
      
      Reducing the stack footprint and the work to copy siginfo around from
      2 cachelines to 1 cachelines seems worth doing even if I don't have
      benchmarks to show a performance difference.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4ce5f9c9
    • E
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman 提交于
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
    • E
      signal: Introduce copy_siginfo_from_user and use it's return value · 4cd2e0e7
      Eric W. Biederman 提交于
      In preparation for using a smaller version of siginfo in the kernel
      introduce copy_siginfo_from_user and use it when siginfo is copied from
      userspace.
      
      Make the pattern for using copy_siginfo_from_user and
      copy_siginfo_from_user32 to capture the return value and return that
      value on error.
      
      This is a necessary prerequisite for using a smaller siginfo
      in the kernel than the kernel exports to userspace.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4cd2e0e7
  21. 23 8月, 2018 2 次提交
  22. 22 7月, 2018 2 次提交
  23. 27 4月, 2018 1 次提交
    • E
      signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} · 31931c93
      Eric W. Biederman 提交于
      Update the siginfo_layout function and enum siginfo_layout to represent
      all of the possible field layouts of struct siginfo.
      
      This allows the uses of siginfo_layout in um and arm64 where they are testing
      for SIL_FAULT to be more accurate as this rules out the other cases.
      
      Further this allows the switch statements on siginfo_layout to be simpler
      if perhaps a little more wordy.  Making it easier to understand what is
      actually going on.
      
      As SIL_FAULT_BNDERR and SIL_FAULT_PKUERR are never expected to appear
      in signalfd just treat them as SIL_FAULT.  To include them would take
      20 extra bytes an pretty much fill up what is left of
      signalfd_siginfo.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      31931c93
  24. 13 1月, 2018 3 次提交
    • E
      signal: Remove unnecessary ifdefs now that there is only one struct siginfo · 0326e7ef
      Eric W. Biederman 提交于
      Remove HAVE_ARCH_SIGINFO_T
      Remove __ARCH_SIGSYS
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0326e7ef
    • E
      signal: Introduce clear_siginfo · 8c5dbf2a
      Eric W. Biederman 提交于
      Unfortunately struct siginfo has holes both in the common part of the
      structure, in the union members, and in the lack of padding of the
      union members.  The result of those wholes is that the C standard does
      not guarantee those bits will be initialized.  As struct siginfo is
      for communication between the kernel and userspace that is a problem.
      
      Add the helper function clear_siginfo that is guaranteed to clear all of
      the bits in struct siginfo so when the structure is copied there is no danger
      of copying old kernel data and causing a leak of information from kernel
      space to userspace.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      8c5dbf2a
    • E
      signal: Reduce copy_siginfo to just a memcpy · 8c36fdf5
      Eric W. Biederman 提交于
      The savings for copying just part of struct siginfo appears to be in the
      noise on modern machines.  So remove this ``optimization'' and simplify the code.
      
      At the same time mark the second parameter as constant so there is no confusion
      as to which direction the copy will go.
      
      This ensures that a fully initialized siginfo that is sent ends up as
      a fully initialized siginfo on the signal queue.  This full initialization
      ensures even confused code won't copy unitialized data to userspace, and
      it prepares for turning copy_siginfo_to_user into a simple copy_to_user.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      8c36fdf5
  25. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  26. 25 7月, 2017 2 次提交
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
    • E
      fcntl: Don't use ambiguous SIG_POLL si_codes · d08477aa
      Eric W. Biederman 提交于
      We have a weird and problematic intersection of features that when
      they all come together result in ambiguous siginfo values, that
      we can not support properly.
      
      - Supporting fcntl(F_SETSIG,...) with arbitrary valid signals.
      
      - Using positive values for POLL_IN, POLL_OUT, POLL_MSG, ..., etc
        that imply they are signal specific si_codes and using the
        aforementioned arbitrary signal to deliver them.
      
      - Supporting injection of arbitrary siginfo values for debugging and
        checkpoint/restore.
      
      The result is that just looking at siginfo si_codes of 1 to 6 are
      ambigious.  It could either be a signal specific si_code or it could
      be a generic si_code.
      
      For most of the kernel this is a non-issue but for sending signals
      with siginfo it is impossible to play back the kernel signals and
      get the same result.
      
      Strictly speaking when the si_code was changed from SI_SIGIO to
      POLL_IN and friends between 2.2 and 2.4 this functionality was not
      ambiguous, as only real time signals were supported.  Before 2.4 was
      released the kernel began supporting siginfo with non realtime signals
      so they could give details of why the signal was sent.
      
      The result is that if F_SETSIG is set to one of the signals with signal
      specific si_codes then user space can not know why the signal was sent.
      
      I grepped through a bunch of userspace programs using debian code
      search to get a feel for how often people choose a signal that results
      in an ambiguous si_code.  I only found one program doing so and it was
      using SIGCHLD to test the F_SETSIG functionality, and did not appear
      to be a real world usage.
      
      Therefore the ambiguity does not appears to be a real world problem in
      practice.  Remove the ambiguity while introducing the smallest chance
      of breakage by changing the si_code to SI_SIGIO when signals with
      signal specific si_codes are targeted.
      
      Fixes: v2.3.40 -- Added support for queueing non-rt signals
      Fixes: v2.3.21 -- Changed the si_code from SI_SIGIO
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d08477aa
  27. 10 6月, 2017 1 次提交
  28. 04 6月, 2017 2 次提交
  29. 18 4月, 2017 1 次提交
  30. 03 3月, 2017 1 次提交
  31. 02 3月, 2017 1 次提交