1. 13 1月, 2018 4 次提交
    • E
      signal: Clear si_sys_private before copying siginfo to userspace · 9943d3ac
      Eric W. Biederman 提交于
      In preparation for unconditionally copying the whole of siginfo
      to userspace clear si_sys_private.  So this kernel internal
      value is guaranteed not to make it to userspace.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      9943d3ac
    • E
    • E
      signal: Ensure generic siginfos the kernel sends have all bits initialized · faf1f22b
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure stack allocated siginfos are fully
      initialized before being passed to the signal sending functions.
      
      This ensures that if there is the kind of confusion documented by
      TRAP_FIXME, FPE_FIXME, or BUS_FIXME the kernel won't send unitialized
      data to userspace when the kernel generates a signal with SI_USER but
      the copy to userspace assumes it is a different kind of signal, and
      different fields are initialized.
      
      This also prepares the way for turning copy_siginfo_to_user
      into a copy_to_user, by removing the need in many cases to perform
      a field by field copy simply to skip the uninitialized fields.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      faf1f22b
    • E
      signal/arm64: Document conflicts with SI_USER and SIGFPE,SIGTRAP,SIGBUS · 526c3ddb
      Eric W. Biederman 提交于
      Setting si_code to 0 results in a userspace seeing an si_code of 0.
      This is the same si_code as SI_USER.  Posix and common sense requires
      that SI_USER not be a signal specific si_code.  As such this use of 0
      for the si_code is a pretty horribly broken ABI.
      
      Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
      value of __SI_KILL and now sees a value of SIL_KILL with the result
      that uid and pid fields are copied and which might copying the si_addr
      field by accident but certainly not by design.  Making this a very
      flakey implementation.
      
      Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
      SIL_FAULT and the appropriate fields will be reliably copied.
      
      But folks this is a new and unique kind of bad.  This is massively
      untested code bad.  This is inventing new and unique was to get
      siginfo wrong bad.  This is don't even think about Posix or what
      siginfo means bad.  This is lots of eyeballs all missing the fact
      that the code does the wrong thing bad.  This is getting stuck
      and keep making the same mistake bad.
      
      I really hope we can find a non userspace breaking fix for this on a
      port as new as arm64.
      
      Possible ABI fixes include:
      - Send the signal without siginfo
      - Don't generate a signal
      - Possibly assign and use an appropriate si_code
      - Don't handle cases which can't happen
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tyler Baicar <tbaicar@codeaurora.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Ref: 53631b54 ("arm64: Floating point and SIMD")
      Ref: 32015c23 ("arm64: exception: handle Synchronous External Abort")
      Ref: 1d18c47c ("arm64: MMU fault handling and page table management")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      526c3ddb
  2. 04 1月, 2018 1 次提交
    • E
      signal: Simplify and fix kdb_send_sig · 0b44bf9a
      Eric W. Biederman 提交于
      - Rename from kdb_send_sig_info to kdb_send_sig
        As there is no meaningful siginfo sent
      
      - Use SEND_SIG_PRIV instead of generating a siginfo for a kdb
        signal.  The generated siginfo had a bogus rationale and was
        not correct in the face of pid namespaces.  SEND_SIG_PRIV
        is simpler and actually correct.
      
      - As the code grabs siglock just send the signal with siglock
        held instead of dropping siglock and attempting to grab it again.
      
      - Move the sig_valid test into kdb_kill where it can generate
        a good error message.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      0b44bf9a
  3. 18 11月, 2017 3 次提交
  4. 16 11月, 2017 1 次提交
  5. 02 11月, 2017 1 次提交
  6. 20 9月, 2017 4 次提交
  7. 19 8月, 2017 1 次提交
  8. 07 8月, 2017 1 次提交
  9. 25 7月, 2017 1 次提交
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
  10. 11 7月, 2017 1 次提交
  11. 20 6月, 2017 1 次提交
  12. 18 6月, 2017 1 次提交
    • E
      signal: Only reschedule timers on signals timers have sent · 57db7e4a
      Eric W. Biederman 提交于
      Thomas Gleixner  wrote:
      > The CRIU support added a 'feature' which allows a user space task to send
      > arbitrary (kernel) signals to itself. The changelog says:
      >
      >   The kernel prevents sending of siginfo with positive si_code, because
      >   these codes are reserved for kernel.  I think we can allow a task to
      >   send such a siginfo to itself.  This operation should not be dangerous.
      >
      > Quite contrary to that claim, it turns out that it is outright dangerous
      > for signals with info->si_code == SI_TIMER. The following code sequence in
      > a user space task allows to crash the kernel:
      >
      >    id = timer_create(CLOCK_XXX, ..... signo = SIGX);
      >    timer_set(id, ....);
      >    info->si_signo = SIGX;
      >    info->si_code = SI_TIMER:
      >    info->_sifields._timer._tid = id;
      >    info->_sifields._timer._sys_private = 2;
      >    rt_[tg]sigqueueinfo(..., SIGX, info);
      >    sigemptyset(&sigset);
      >    sigaddset(&sigset, SIGX);
      >    rt_sigtimedwait(sigset, info);
      >
      > For timers based on CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID this
      > results in a kernel crash because sigwait() dequeues the signal and the
      > dequeue code observes:
      >
      >   info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0
      >
      > which triggers the following callchain:
      >
      >  do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer()
      >
      > arm_timer() executes a list_add() on the timer, which is already armed via
      > the timer_set() syscall. That's a double list add which corrupts the posix
      > cpu timer list. As a consequence the kernel crashes on the next operation
      > touching the posix cpu timer list.
      >
      > Posix clocks which are internally implemented based on hrtimers are not
      > affected by this because hrtimer_start() can handle already armed timers
      > nicely, but it's a reliable way to trigger the WARN_ON() in
      > hrtimer_forward(), which complains about calling that function on an
      > already armed timer.
      
      This problem has existed since the posix timer code was merged into
      2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to
      inject not just a signal (which linux has supported since 1.0) but the
      full siginfo of a signal.
      
      The core problem is that the code will reschedule in response to
      signals getting dequeued not just for signals the timers sent but
      for other signals that happen to a si_code of SI_TIMER.
      
      Avoid this confusion by testing to see if the queued signal was
      preallocated as all timer signals are preallocated, and so far
      only the timer code preallocates signals.
      
      Move the check for if a timer needs to be rescheduled up into
      collect_signal where the preallocation check must be performed,
      and pass the result back to dequeue_signal where the code reschedules
      timers.   This makes it clear why the code cares about preallocated
      timers.
      
      Cc: stable@vger.kernel.org
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      Reference: 66dd34ad ("signal: allow to send any siginfo to itself")
      Reference: 1669ce53e2ff ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO")
      Fixes: db8b50ba75f2 ("[PATCH] POSIX clocks & timers")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      57db7e4a
  13. 10 6月, 2017 2 次提交
  14. 04 6月, 2017 2 次提交
  15. 28 5月, 2017 1 次提交
  16. 22 4月, 2017 1 次提交
  17. 19 4月, 2017 1 次提交
    • P
      mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU · 5f0d5a3a
      Paul E. McKenney 提交于
      A group of Linux kernel hackers reported chasing a bug that resulted
      from their assumption that SLAB_DESTROY_BY_RCU provided an existence
      guarantee, that is, that no block from such a slab would be reallocated
      during an RCU read-side critical section.  Of course, that is not the
      case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
      slab of blocks.
      
      However, there is a phrase for this, namely "type safety".  This commit
      therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
      to avoid future instances of this sort of confusion.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      [ paulmck: Add comments mentioning the old name, as requested by Eric
        Dumazet, in order to help people familiar with the old name find
        the new one. ]
      Acked-by: NDavid Rientjes <rientjes@google.com>
      5f0d5a3a
  18. 02 3月, 2017 7 次提交
  19. 28 2月, 2017 1 次提交
  20. 01 2月, 2017 3 次提交
    • F
      signal: Convert obsolete cputime type to nsecs · bde8285e
      Frederic Weisbecker 提交于
      Use the new nsec based cputime accessors as part of the whole cputime
      conversion from cputime_t to nsecs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-16-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bde8285e
    • F
      sched/cputime: Convert task/group cputime to nsecs · 5613fda9
      Frederic Weisbecker 提交于
      Now that most cputime readers use the transition API which return the
      task cputime in old style cputime_t, we can safely store the cputime in
      nsecs. This will eventually make cputime statistics less opaque and more
      granular. Back and forth convertions between cputime_t and nsecs in order
      to deal with cputime_t random granularity won't be needed anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5613fda9
    • F
      sched/cputime: Introduce special task_cputime_t() API to return old-typed cputime · a1cecf2b
      Frederic Weisbecker 提交于
      This API returns a task's cputime in cputime_t in order to ease the
      conversion of cputime internals to use nsecs units instead. Blindly
      converting all cputime readers to use this API now will later let us
      convert more smoothly and step by step all these places to use the
      new nsec based cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1cecf2b
  21. 11 1月, 2017 1 次提交
  22. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855