- 26 8月, 2021 1 次提交
-
-
由 Eric W. Biederman 提交于
Factor out force_sig_seccomp from the seccomp signal generation and place it in kernel/signal.c. The function force_sig_seccomp takes a parameter force_coredump to indicate that the sigaction field should be reset to SIGDFL so that a coredump will be generated when the signal is delivered. force_sig_seccomp is then used to replace both seccomp_send_sigsys and seccomp_init_siginfo. force_sig_info_to_task gains an extra parameter to force using the default signal action. With this change seccomp is no longer a special case and there becomes exactly one place do_coredump is called from. Further it no longer becomes necessary for __seccomp_filter to call do_group_exit. Acked-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 10 8月, 2021 1 次提交
-
-
由 Frederic Weisbecker 提交于
Starting the process wide cputime counter needs to be done in the same sighand locking sequence than actually arming the related timer otherwise this races against concurrent timers setting/expiring in the same threadgroup. Detecting that the cputime counter is started without holding the sighand lock is a first step toward debugging such situations. Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-2-frederic@kernel.org
-
- 24 7月, 2021 3 次提交
-
-
由 Eric W. Biederman 提交于
Now that __ARCH_SI_TRAPNO is no longer set by any architecture remove all of the code it enabled from the kernel. On alpha and sparc a more explict approach of using send_sig_fault_trapno or force_sig_fault_trapno in the very limited circumstances where si_trapno was set to a non-zero value. The generic support that is being removed always set si_trapno on all fault signals. With only SIGILL ILL_ILLTRAP on sparc and SIGFPE and SIGTRAP TRAP_UNK on alpla providing si_trapno values asking all senders of fault signals to provide an si_trapno value does not make sense. Making si_trapno an ordinary extension of the fault siginfo layout has enabled the architecture generic implementation of SIGTRAP TRAP_PERF, and enables other faulting signals to grow architecture generic senders as well. v1: https://lkml.kernel.org/r/m18s4zs7nu.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-8-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87bl73xx6x.fsf_-_@disp2133Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
While reviewing the signal handlers on alpha it became clear that si_trapno is only set to a non-zero value when sending SIGFPE and when sending SITGRAP with si_code TRAP_UNK. Add send_sig_fault_trapno and send SIGTRAP TRAP_UNK, and SIGFPE with it. Remove the define of __ARCH_SI_TRAPNO and remove the always zero si_trapno parameter from send_sig_fault and force_sig_fault. v1: https://lkml.kernel.org/r/m1eeers7q7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87h7gvxx7l.fsf_-_@disp2133Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
While reviewing the signal handlers on sparc it became clear that si_trapno is only set to a non-zero value when sending SIGILL with si_code ILL_ILLTRP. Add force_sig_fault_trapno and send SIGILL ILL_ILLTRP with it. Remove the define of __ARCH_SI_TRAPNO and remove the always zero si_trapno parameter from send_sig_fault and force_sig_fault. v1: https://lkml.kernel.org/r/m1eeers7q7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87mtqnxx89.fsf_-_@disp2133Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 18 6月, 2021 1 次提交
-
-
由 Peter Zijlstra 提交于
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NWill Deacon <will@kernel.org> Acked-by: NDaniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
-
- 19 5月, 2021 2 次提交
-
-
由 Chang S. Bae 提交于
The kernel pushes context on to the userspace stack to prepare for the user's signal handler. When the user has supplied an alternate signal stack, via sigaltstack(2), it is easy for the kernel to verify that the stack size is sufficient for the current hardware context. Check if writing the hardware context to the alternate stack will exceed it's size. If yes, then instead of corrupting user-data and proceeding with the original signal handler, an immediate SIGSEGV signal is delivered. Refactor the stack pointer check code from on_sig_stack() and use the new helper. While the kernel allows new source code to discover and use a sufficient alternate signal stack size, this check is still necessary to protect binaries with insufficient alternate signal stack size from data corruption. Fixes: c2bc11f1 ("x86, AVX-512: Enable AVX-512 States Context Switch") Reported-by: NFlorian Weimer <fweimer@redhat.com> Suggested-by: NJann Horn <jannh@google.com> Suggested-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NLen Brown <len.brown@intel.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-6-chang.seok.bae@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=153531
-
由 Eric W. Biederman 提交于
Separate filling in siginfo for TRAP_PERF from deciding that siginal needs to be sent. There are enough little details that need to be correct when properly filling in siginfo_t that it is easy to make mistakes if filling in the siginfo_t is in the same function with other logic. So factor out force_sig_perf to reduce the cognative load of on reviewers, maintainers and implementors. v1: https://lkml.kernel.org/r/m17dkjqqxz.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-3-ebiederm@xmission.comReviewed-by: NMarco Elver <elver@google.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 13 12月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
It's available everywhere now, no need to check or add dummy defines. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 12月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea96732 ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.orgSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 29 10月, 2020 2 次提交
-
-
由 Jens Axboe 提交于
Add TIF_NOTIFY_SIGNAL handling in the generic entry code, which if set, will return true if signal_pending() is used in a wait loop. That causes an exit of the loop so that notify_signal tracehooks can be run. If the wait loop is currently inside a system call, the system call is restarted once task_work has been processed. In preparation for only having arch_do_signal() handle syscall restarts if _TIF_SIGPENDING isn't set, rename it to arch_do_signal_or_restart(). Pass in a boolean that tells the architecture specific signal handler if it should attempt to get a signal, or just process a potential syscall restart. For !CONFIG_GENERIC_ENTRY archs, add the TIF_NOTIFY_SIGNAL handling to get_signal(). This is done to minimize the needed architecture changes to support this feature. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20201026203230.386348-3-axboe@kernel.dk
-
由 Jens Axboe 提交于
This is in preparation for maintaining signal_pending() as the decider of whether or not a schedule() loop should be broken, or continue sleeping. This is different than the core signal use cases, which really need to know whether an actual signal is pending or not. task_sigpending() returns non-zero if TIF_SIGPENDING is set. Only core kernel use cases should care about the distinction between the two, make sure those use the task_sigpending() helper. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20201026203230.386348-2-axboe@kernel.dk
-
- 08 7月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
Create an independent helper thread_group_exited which returns true when all threads have passed exit_notify in do_exit. AKA all of the threads are at least zombies and might be dead or completely gone. Create this helper by taking the logic out of pidfd_poll where it is already tested, and adding a READ_ONCE on the read of task->exit_state. I will be changing the user mode driver code to use this same logic to know when a user mode driver needs to be restarted. Place the new helper thread_group_exited in kernel/exit.c and EXPORT it so it can be used by modules. Link: https://lkml.kernel.org/r/20200702164140.4468-13-ebiederm@xmission.comAcked-by: NChristian Brauner <christian.brauner@ubuntu.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Tested-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 29 4月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
After the introduction of exchange_tids has_group_leader_pid is equivalent to thread_group_leader. After the last couple of cleanups has_group_leader_pid has no more callers. So remove the now unused and redundant has_group_leader_pid. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 03 4月, 2020 2 次提交
-
-
由 Peter Xu 提交于
The idea comes from the upstream discussion between Linus and Andrea: https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ A summary to the issue: there was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. This patch is a preparation of removing that special path by allowing the page fault to return even faster if we were interrupted by a non-fatal signal during a user-mode page fault handling routine. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Suggested-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NBrian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160230.9598-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path. It cleans the current codes a bit so we don't need to duplicate the same check across archs. More importantly, this will be an unified place that we handle the signal immediately right after an interrupted page fault, so it'll be much easier for us if we want to change the behavior of handling signals later on for all the archs. Note that currently only part of the archs are using this new helper, because some archs have their own way to handle signals. In the follow up patches, we'll try to apply this helper to all the rest of archs. Another note is that the "regs" parameter in the new helper is not used yet. It'll be used very soon. Now we kept it in this patch only to avoid touching all the archs again in the follow up patches. [peterx@redhat.com: fix sparse warnings] Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NBrian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 3月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
The cred_guard_mutex is problematic as it is held over possibly indefinite waits for userspace. The possible indefinite waits for userspace that I have identified are: The cred_guard_mutex is held in PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The cred_guard_mutex is held over "get_user(futex_offset, ...") in exit_robust_list. The cred_guard_mutex held over copy_strings. The functions get_user and put_user can trigger a page fault which can potentially wait indefinitely in the case of userfaultfd or if userspace implements part of the page fault path. In any of those cases the userspace process that the kernel is waiting for might make a different system call that winds up taking the cred_guard_mutex and result in deadlock. Holding a mutex over any of those possibly indefinite waits for userspace does not appear necessary. Add exec_update_mutex that will just cover updating the process during exec where the permissions and the objects pointed to by the task struct may be out of sync. The plan is to switch the users of cred_guard_mutex to exec_update_mutex one by one. This lets us move forward while still being careful and not introducing any regressions. Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 28 8月, 2019 3 次提交
-
-
由 Thomas Gleixner 提交于
Put it where it belongs and clean up the ifdeffery in fork completely. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de
-
由 Thomas Gleixner 提交于
The expiry cache belongs into the posix_cputimers container where the other cpu timers information is. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de
-
由 Thomas Gleixner 提交于
Per task/process data of posix CPU timers is all over the place which makes the code hard to follow and requires ifdeffery. Create a container to hold all this information in one place, so data is consolidated and the ifdeffery can be confined to the posix timer header file and removed from places like fork. As a first step, move the cpu_timers list head array into the new struct and clean up the initializers and simplify fork. The remaining #ifdef in fork will be removed later. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192920.819418976@linutronix.de
-
- 17 7月, 2019 2 次提交
-
-
由 Oleg Nesterov 提交于
task->saved_sigmask and ->restore_sigmask are only used in the ret-from- syscall paths. This means that set_user_sigmask() can save ->blocked in ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked was modified. This way the callers do not need 2 sigset_t's passed to set/restore and restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns into the trivial helper which just calls restore_saved_sigmask(). Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Eric Wong <e@80x24.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Laight <David.Laight@aculab.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
struct sighand_struct::siglock field is the most used field by far, put it first so that is can be accessed without IMM8 or IMM32 encoding on x86_64. Space savings (on trimmed down VM test config): add/remove: 0/0 grow/shrink: 8/68 up/down: 49/-1147 (-1098) Function old new delta complete_signal 512 533 +21 do_signalfd4 335 346 +11 __cleanup_sighand 39 43 +4 unhandled_signal 49 52 +3 prepare_signal 692 695 +3 ignore_signals 37 40 +3 __tty_check_change.part 248 251 +3 ksys_unshare 780 781 +1 sighand_ctor 33 29 -4 ptrace_trap_notify 60 56 -4 sigqueue_free 98 91 -7 run_posix_cpu_timers 1389 1382 -7 proc_pid_status 2448 2441 -7 proc_pid_limits 344 337 -7 posix_cpu_timer_rearm 222 215 -7 posix_cpu_timer_get 249 242 -7 kill_pid_info_as_cred 243 236 -7 freeze_task 197 190 -7 flush_old_exec 1873 1866 -7 do_task_stat 3363 3356 -7 do_send_sig_info 98 91 -7 do_group_exit 147 140 -7 init_sighand 2088 2080 -8 do_notify_parent_cldstop 399 391 -8 signalfd_cleanup 50 41 -9 do_notify_parent 557 545 -12 __send_signal 1029 1017 -12 ptrace_stop 590 577 -13 get_signal 1576 1563 -13 __lock_task_sighand 112 99 -13 zap_pid_ns_processes 391 377 -14 update_rlimit_cpu 78 64 -14 tty_signal_session_leader 413 399 -14 tty_open_proc_set_tty 149 135 -14 tty_jobctrl_ioctl 936 922 -14 set_cpu_itimer 339 325 -14 ptrace_resume 226 212 -14 ptrace_notify 110 96 -14 proc_clear_tty 81 67 -14 posix_cpu_timer_del 229 215 -14 kernel_sigaction 156 142 -14 getrusage 977 963 -14 get_current_tty 98 84 -14 force_sigsegv 89 75 -14 force_sig_info 205 191 -14 flush_signals 83 69 -14 flush_itimer_signals 85 71 -14 do_timer_create 1120 1106 -14 do_sigpending 88 74 -14 do_signal_stop 537 523 -14 cgroup_init_fs_context 644 630 -14 call_usermodehelper_exec_async 402 388 -14 calculate_sigpending 58 44 -14 __x64_sys_timer_delete 248 234 -14 __set_current_blocked 80 66 -14 __ptrace_unlink 310 296 -14 __ptrace_detach.part 187 173 -14 send_sigqueue 362 347 -15 get_cpu_itimer 214 199 -15 signalfd_poll 175 159 -16 dequeue_signal 340 323 -17 do_getitimer 192 174 -18 release_task.part 1060 1040 -20 ptrace_peek_siginfo 408 387 -21 posix_cpu_timer_set 827 806 -21 exit_signals 437 416 -21 do_sigaction 541 520 -21 do_setitimer 485 464 -21 disassociate_ctty.part 545 517 -28 __x64_sys_rt_sigtimedwait 721 679 -42 __x64_sys_ptrace 1319 1277 -42 ptrace_request 1828 1782 -46 signalfd_read 507 459 -48 wait_consider_task 2027 1971 -56 do_coredump 3672 3616 -56 copy_process.part 6936 6871 -65 Link: http://lkml.kernel.org/r/20190503192800.GA18004@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 5月, 2019 3 次提交
-
-
由 Eric W. Biederman 提交于
force_sig_info always delivers to the current task and the signal parameter always matches info.si_signo. So remove those parameters to make it a simpler less error prone interface, and to make it clear that none of the callers are doing anything clever. This guarantees that force_sig_info will not grow any new buggy callers that attempt to call force_sig on a non-current task, or that pass an signal number that does not match info.si_signo. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
As synchronous exceptions really only make sense against the current task (otherwise how are you synchronous) remove the task parameter from from force_sig_fault to make it explicit that is what is going on. The two known exceptions that deliver a synchronous exception to a stopped ptraced task have already been changed to force_sig_fault_to_task. The callers have been changed with the following emacs regular expression (with obvious variations on the architectures that take more arguments) to avoid typos: force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)] -> force_sig_fault(\1,\2,\3) Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
In preparation for removing the task parameter from force_sig_fault introduce force_sig_fault_to_task and use it for the two cases where it matters. On mips force_fcr31_sig calls force_sig_fault and is called on either the current task, or a task that is suspended and is being switched to by the scheduler. This is safe because the task being switched to by the scheduler is guaranteed to be suspended. This ensures that task->sighand is stable while the signal is delivered to it. On parisc user_enable_single_step calls force_sig_fault and is in turn called by ptrace_request. The function ptrace_request always calls user_enable_single_step on a child that is stopped for tracing. The child being traced and not reaped ensures that child->sighand is not NULL, and that the child will not change child->sighand. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 27 5月, 2019 3 次提交
-
-
由 Eric W. Biederman 提交于
All of the callers pass current into force_sig_mceer so remove the task parameter to make this obvious. This also makes it clear that force_sig_mceerr passes current into force_sig_info. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
All of the remaining callers pass current into force_sig so remove the task parameter to make this obvious and to make misuse more difficult in the future. This also makes it clear force_sig passes current into force_sig_info. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
The function force_sigsegv is always called on the current task so passing in current is redundant and not passing in current makes this fact obvious. This also makes it clear force_sigsegv always calls force_sig on the current task. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 23 5月, 2019 1 次提交
-
-
由 Eric W. Biederman 提交于
The usb support for asyncio encoded one of it's values in the wrong field. It should have used si_value but instead used si_addr which is not present in the _rt union member of struct siginfo. The practical result of this is that on a 64bit big endian kernel when delivering a signal to a 32bit process the si_addr field is set to NULL, instead of the expected pointer value. This issue can not be fixed in copy_siginfo_to_user32 as the usb usage of the the _sigfault (aka si_addr) member of the siginfo union when SI_ASYNCIO is set is incompatible with the POSIX and glibc usage of the _rt member of the siginfo union. Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a dedicated function for this one specific case. There are no other users of kill_pid_info_as_cred so this specialization should have no impact on the amount of code in the kernel. Have kill_pid_usb_asyncio take instead of a siginfo_t which is difficult and error prone, 3 arguments, a signal number, an errno value, and an address enconded as a sigval_t. The encoding of the address as a sigval_t allows the code that reads the userspace request for a signal to handle this compat issue along with all of the other compat issues. Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place the pointer value at the in si_pid (instead of si_addr). That is the code now verifies that si_pid and si_addr always occur at the same location. Further the code veries that for native structures a value placed in si_pid and spilling into si_uid will appear in userspace in si_addr (on a byte by byte copy of siginfo or a field by field copy of siginfo). The code also verifies that for a 64bit kernel and a 32bit userspace the 32bit pointer will fit in si_pid. I have used the usbsig.c program below written by Alan Stern and slightly tweaked by me to run on a big endian machine to verify the issue exists (on sparc64) and to confirm the patch below fixes the issue. /* usbsig.c -- test USB async signal delivery */ #define _GNU_SOURCE #include <stdio.h> #include <fcntl.h> #include <signal.h> #include <string.h> #include <sys/ioctl.h> #include <unistd.h> #include <endian.h> #include <linux/usb/ch9.h> #include <linux/usbdevice_fs.h> static struct usbdevfs_urb urb; static struct usbdevfs_disconnectsignal ds; static volatile sig_atomic_t done = 0; void urb_handler(int sig, siginfo_t *info , void *ucontext) { printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n", sig, info->si_signo, info->si_errno, info->si_code, info->si_addr, &urb); printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad"); } void ds_handler(int sig, siginfo_t *info , void *ucontext) { printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n", sig, info->si_signo, info->si_errno, info->si_code, info->si_addr, &ds); printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad"); done = 1; } int main(int argc, char **argv) { char *devfilename; int fd; int rc; struct sigaction act; struct usb_ctrlrequest *req; void *ptr; char buf[80]; if (argc != 2) { fprintf(stderr, "Usage: usbsig device-file-name\n"); return 1; } devfilename = argv[1]; fd = open(devfilename, O_RDWR); if (fd == -1) { perror("Error opening device file"); return 1; } act.sa_sigaction = urb_handler; sigemptyset(&act.sa_mask); act.sa_flags = SA_SIGINFO; rc = sigaction(SIGUSR1, &act, NULL); if (rc == -1) { perror("Error in sigaction"); return 1; } act.sa_sigaction = ds_handler; sigemptyset(&act.sa_mask); act.sa_flags = SA_SIGINFO; rc = sigaction(SIGUSR2, &act, NULL); if (rc == -1) { perror("Error in sigaction"); return 1; } memset(&urb, 0, sizeof(urb)); urb.type = USBDEVFS_URB_TYPE_CONTROL; urb.endpoint = USB_DIR_IN | 0; urb.buffer = buf; urb.buffer_length = sizeof(buf); urb.signr = SIGUSR1; req = (struct usb_ctrlrequest *) buf; req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE; req->bRequest = USB_REQ_GET_DESCRIPTOR; req->wValue = htole16(USB_DT_DEVICE << 8); req->wIndex = htole16(0); req->wLength = htole16(sizeof(buf) - sizeof(*req)); rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb); if (rc == -1) { perror("Error in SUBMITURB ioctl"); return 1; } rc = ioctl(fd, USBDEVFS_REAPURB, &ptr); if (rc == -1) { perror("Error in REAPURB ioctl"); return 1; } memset(&ds, 0, sizeof(ds)); ds.signr = SIGUSR2; ds.context = &ds; rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds); if (rc == -1) { perror("Error in DISCSIGNAL ioctl"); return 1; } printf("Waiting for usb disconnect\n"); while (!done) { sleep(1); } close(fd); return 0; } Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-usb@vger.kernel.org Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Oliver Neukum <oneukum@suse.com> Fixes: v2.3.39 Cc: stable@vger.kernel.org Acked-by: NAlan Stern <stern@rowland.harvard.edu> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 15 5月, 2019 1 次提交
-
-
由 Andrei Vagin 提交于
This file uses "task" 85 times and "tsk" 25 times. It is better to be consistent. Link: http://lkml.kernel.org/r/20181129180547.15976-1-avagin@gmail.comSigned-off-by: NAndrei Vagin <avagin@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 3月, 2019 1 次提交
-
-
由 Andrei Vagin 提交于
There are a few system calls (pselect, ppoll, etc) which replace a task sigmask while they are running in a kernel-space When a task calls one of these syscalls, the kernel saves a current sigmask in task->saved_sigmask and sets a syscall sigmask. On syscall-exit-stop, ptrace traps a task before restoring the saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by saved_sigmask, when the task returns to user-space. This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag. Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com Fixes: 29000cae ("ptrace: add ability to get/set signal-blocked mask") Signed-off-by: NAndrei Vagin <avagin@gmail.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 2月, 2019 2 次提交
-
-
由 Elena Reshetova 提交于
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable signal_struct.sigcnt is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the signal_struct.sigcnt it might make a difference in following places: - put_signal_struct(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: NKees Cook <keescook@chromium.org> Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDavid Windsor <dwindsor@gmail.com> Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com> Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-3-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Elena Reshetova 提交于
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable sighand_struct.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the sighand_struct.count it might make a difference in following places: - __cleanup_sighand: decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: NKees Cook <keescook@chromium.org> Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDavid Windsor <dwindsor@gmail.com> Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com> Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-2-git-send-email-elena.reshetova@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 10月, 2018 1 次提交
-
-
由 Eric W. Biederman 提交于
Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 12 9月, 2018 2 次提交
-
-
由 Eric W. Biederman 提交于
There are no more users of SEND_SIG_FORCED so it may be safely removed. Remove the definition of SEND_SIG_FORCED, it's use in is_si_special, it's use in TP_STORE_SIGINFO, and it's use in __send_signal as without any users the uses of SEND_SIG_FORCED are now unncessary. This makes the code simpler, easier to understand and use. Users of signal sending functions now no longer need to ask themselves do I need to use SEND_SIG_FORCED. Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
None of the callers use the it so remove it. Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 23 8月, 2018 1 次提交
-
-
由 Christian Brauner 提交于
Patch series "signal: refactor some functions", v3. This series refactors a bunch of functions in signal.c to simplify parts of the code. The greatest single change is declaring the static do_sigpending() helper as void which makes it possible to remove a bunch of unnecessary checks in the syscalls later on. This patch (of 17): force_sigsegv() returned 0 unconditionally so it doesn't make sense to have it return at all. In addition, there are no callers that check force_sigsegv()'s return value. Link: http://lkml.kernel.org/r/20180602103653.18181-2-christian@brauner.ioSigned-off-by: NChristian Brauner <christian@brauner.io> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2018 1 次提交
-
-
由 Eric W. Biederman 提交于
Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn> report that a periodic signal received during fork can cause fork to continually restart preventing an application from making progress. The code was being overly pessimistic. Fork needs to guarantee that a signal sent to multiple processes is logically delivered before the fork and just to the forking process or logically delivered after the fork to both the forking process and it's newly spawned child. For signals like periodic timers that are always delivered to a single process fork can safely complete and let them appear to logically delivered after the fork(). While examining this issue I also discovered that fork today will miss signals delivered to multiple processes during the fork and handled by another thread. Similarly the current code will also miss blocked signals that are delivered to multiple process, as those signals will not appear pending during fork. Add a list of each thread that is currently forking, and keep on that list a signal set that records all of the signals sent to multiple processes. When fork completes initialize the new processes shared_pending signal set with it. The calculate_sigpending function will see those signals and set TIF_SIGPENDING causing the new task to take the slow path to userspace to handle those signals. Making it appear as if those signals were received immediately after the fork. It is not possible to send real time signals to multiple processes and exceptions don't go to multiple processes, which means that that are no signals sent to multiple processes that require siginfo. This means it is safe to not bother collecting siginfo on signals sent during fork. The sigaction of a child of fork is initially the same as the sigaction of the parent process. So a signal the parent ignores the child will also initially ignore. Therefore it is safe to ignore signals sent to multiple processes and ignored by the forking process. Signals sent to only a single process or only a single thread and delivered during fork are treated as if they are received after the fork, and generally not dealt with. They won't cause any problems. V2: Added removal from the multiprocess list on failure. V3: Use -ERESTARTNOINTR directly V4: - Don't queue both SIGCONT and SIGSTOP - Initialize signal_struct.multiprocess in init_task - Move setting of shared_pending to before the new task is visible to signals. This prevents signals from comming in before shared_pending.signal is set to delayed.signal and being lost. V5: - rework list add and delete to account for idle threads v6: - Use sigdelsetmask when removing stop signals Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447 Reported-by: Wen Yang <wen.yang99@zte.com.cn> and Reported-by: Nmajiang <ma.jiang@zte.com.cn> Fixes: 4a2c7a78 ("[PATCH] make fork() atomic wrt pgrp/session signals") Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 04 8月, 2018 2 次提交
-
-
由 Eric W. Biederman 提交于
There are only two signals that are delivered to every member of a signal group: SIGSTOP and SIGKILL. Signal delivery requires every signal appear to be delivered either before or after a clone syscall. SIGKILL terminates the clone so does not need to be considered. Which leaves only SIGSTOP that needs to be considered when creating new threads. Today in the event of a group stop TIF_SIGPENDING will get set and the fork will restart ensuring the fork syscall participates in the group stop. A fork (especially of a process with a lot of memory) is one of the most expensive system so we really only want to restart a fork when necessary. It is easy so check to see if a SIGSTOP is ongoing and have the new thread join it immediate after the clone completes. Making it appear the clone completed happened just before the SIGSTOP. The calculate_sigpending function will see the bits set in jobctl and set TIF_SIGPENDING to ensure the new task takes the slow path to userspace. V2: The call to task_join_group_stop was moved before the new task is added to the thread group list. This should not matter as sighand->siglock is held over both the addition of the threads, the call to task_join_group_stop and do_signal_stop. But the change is trivial and it is one less thing to worry about when reading the code. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Add a function calculate_sigpending to test to see if any signals are pending for a new task immediately following fork. Signals have to happen either before or after fork. Today our practice is to push all of the signals to before the fork, but that has the downside that frequent or periodic signals can make fork take much much longer than normal or prevent fork from completing entirely. So we need move signals that we can after the fork to prevent that. This updates the code to set TIF_SIGPENDING on a new task if there are signals or other activities that have moved so that they appear to happen after the fork. As the code today restarts if it sees any such activity this won't immediately have an effect, as there will be no reason for it to set TIF_SIGPENDING immediately after the fork. Adding calculate_sigpending means the code in fork can safely be changed to not always restart if a signal is pending. The new calculate_sigpending function sets sigpending if there are pending bits in jobctl, pending signals, the freezer needs to freeze the new task or the live kernel patching framework need the new thread to take the slow path to userspace. I have verified that setting TIF_SIGPENDING does make a new process take the slow path to userspace before it executes it's first userspace instruction. I have looked at the callers of signal_wake_up and the code paths setting TIF_SIGPENDING and I don't see anything else that needs to be handled. The code probably doesn't need to set TIF_SIGPENDING for the kernel live patching as it uses a separate thread flag as well. But at this point it seems safer reuse the recalc_sigpending logic and get the kernel live patching folks to sort out their story later. V2: I have moved the test into schedule_tail where siglock can be grabbed and recalc_sigpending can be reused directly. Further as the last action of setting up a new task this guarantees that TIF_SIGPENDING will be properly set in the new process. The helper calculate_sigpending takes the siglock and uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending clear TIF_SIGPENDING if it is unnecessary. This allows reusing the existing code and keeps maintenance of the conditions simple. Oleg Nesterov <oleg@redhat.com> suggested the movement and pointed out the need to take siglock if this code was going to be called while the new task is discoverable. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-