1. 24 1月, 2018 1 次提交
  2. 15 1月, 2018 2 次提交
    • L
      futex: Prevent overflow by strengthen input validation · fbe0e839
      Li Jinyue 提交于
      UBSAN reports signed integer overflow in kernel/futex.c:
      
       UBSAN: Undefined behaviour in kernel/futex.c:2041:18
       signed integer overflow:
       0 - -2147483648 cannot be represented in type 'int'
      
      Add a sanity check to catch negative values of nr_wake and nr_requeue.
      Signed-off-by: NLi Jinyue <lijinyue@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Cc: dvhart@infradead.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com
      fbe0e839
    • P
      futex: Avoid violating the 10th rule of futex · c1e2f0ea
      Peter Zijlstra 提交于
      Julia reported futex state corruption in the following scenario:
      
         waiter                                  waker                                            stealer (prio > waiter)
      
         futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
               timeout=[N ms])
            futex_wait_requeue_pi()
               futex_wait_queue_me()
                  freezable_schedule()
                  <scheduled out>
                                                 futex(LOCK_PI, uaddr2)
                                                 futex(CMP_REQUEUE_PI, uaddr,
                                                       uaddr2, 1, 0)
                                                    /* requeues waiter to uaddr2 */
                                                 futex(UNLOCK_PI, uaddr2)
                                                       wake_futex_pi()
                                                          cmp_futex_value_locked(uaddr2, waiter)
                                                          wake_up_q()
                 <woken by waker>
                 <hrtimer_wakeup() fires,
                  clears sleeper->task>
                                                                                                 futex(LOCK_PI, uaddr2)
                                                                                                    __rt_mutex_start_proxy_lock()
                                                                                                       try_to_take_rt_mutex() /* steals lock */
                                                                                                          rt_mutex_set_owner(lock, stealer)
                                                                                                    <preempted>
               <scheduled in>
               rt_mutex_wait_proxy_lock()
                  __rt_mutex_slowlock()
                     try_to_take_rt_mutex() /* fails, lock held by stealer */
                     if (timeout && !timeout->task)
                        return -ETIMEDOUT;
                  fixup_owner()
                     /* lock wasn't acquired, so,
                        fixup_pi_state_owner skipped */
      
         return -ETIMEDOUT;
      
         /* At this point, we've returned -ETIMEDOUT to userspace, but the
          * futex word shows waiter to be the owner, and the pi_mutex has
          * stealer as the owner */
      
         futex_lock(LOCK_PI, uaddr2)
           -> bails with EDEADLK, futex word says we're owner.
      
      And suggested that what commit:
      
        73d786bd ("futex: Rework inconsistent rt_mutex/futex_q state")
      
      removes from fixup_owner() looks to be just what is needed. And indeed
      it is -- I completely missed that requeue_pi could also result in this
      case. So we need to restore that, except that subsequent patches, like
      commit:
      
        16ffa12d ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
      
      changed all the locking rules. Even without that, the sequence:
      
      -               if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
      -                       locked = 1;
      -                       goto out;
      -               }
      
      -               raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
      -               owner = rt_mutex_owner(&q->pi_state->pi_mutex);
      -               if (!owner)
      -                       owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
      -               raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
      -               ret = fixup_pi_state_owner(uaddr, q, owner);
      
      already suggests there were races; otherwise we'd never have to look
      at next_owner.
      
      So instead of doing 3 consecutive wait_lock sections with who knows
      what races, we do it all in a single section. Additionally, the usage
      of pi_state->owner in fixup_owner() was only safe because only the
      rt_mutex owner would modify it, which this additional case wrecks.
      
      Luckily the values can only change away and not to the value we're
      testing, this means we can do a speculative test and double check once
      we have the wait_lock.
      
      Fixes: 73d786bd ("futex: Rework inconsistent rt_mutex/futex_q state")
      Reported-by: NJulia Cartwright <julia@ni.com>
      Reported-by: NGratian Crisan <gratian.crisan@ni.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJulia Cartwright <julia@ni.com>
      Tested-by: NGratian Crisan <gratian.crisan@ni.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net
      c1e2f0ea
  3. 11 12月, 2017 1 次提交
  4. 02 11月, 2017 1 次提交
    • J
      futex: futex_wake_op, do not fail on invalid op · e78c38f6
      Jiri Slaby 提交于
      In commit 30d6e0a4 ("futex: Remove duplicated code and fix undefined
      behaviour"), I let FUTEX_WAKE_OP to fail on invalid op.  Namely when op
      should be considered as shift and the shift is out of range (< 0 or > 31).
      
      But strace's test suite does this madness:
      
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee);
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced);
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff);
      
      When I pick the first 0xa0caffee, it decodes as:
      
        0x80000000 & 0xa0caffee: oparg is shift
        0x70000000 & 0xa0caffee: op is FUTEX_OP_OR
        0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ
        0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849
        0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18
      
      That means the op tries to do this:
      
        (futex |= (1 << (-849))) == -18
      
      which is completely bogus. The new check of op in the code is:
      
              if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
                      if (oparg < 0 || oparg > 31)
                              return -EINVAL;
                      oparg = 1 << oparg;
              }
      
      which results obviously in the "Invalid argument" errno:
      
        FAIL: futex
        ===========
      
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument
        futex.test: failed test: ../futex failed with code 1
      
      So let us soften the failure to print only a (ratelimited) message, crop
      the value and continue as if it were right.  When userspace keeps up, we
      can switch this to return -EINVAL again.
      
      [v2] Do not return 0 immediatelly, proceed with the cropped value.
      
      Fixes: 30d6e0a4 ("futex: Remove duplicated code and fix undefined behaviour")
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <dvhart@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e78c38f6
  5. 01 11月, 2017 1 次提交
    • P
      futex: Fix more put_pi_state() vs. exit_pi_state_list() races · 153fbd12
      Peter Zijlstra 提交于
      Dmitry (through syzbot) reported being able to trigger the WARN in
      get_pi_state() and a use-after-free on:
      
      	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
      
      Both are due to this race:
      
        exit_pi_state_list()				put_pi_state()
      
        lock(&curr->pi_lock)
        while() {
      	pi_state = list_first_entry(head);
      	hb = hash_futex(&pi_state->key);
      	unlock(&curr->pi_lock);
      
      						dec_and_test(&pi_state->refcount);
      
      	lock(&hb->lock)
      	lock(&pi_state->pi_mutex.wait_lock)	// uaf if pi_state free'd
      	lock(&curr->pi_lock);
      
      	....
      
      	unlock(&curr->pi_lock);
      	get_pi_state();				// WARN; refcount==0
      
      The problem is we take the reference count too late, and don't allow it
      being 0. Fix it by using inc_not_zero() and simply retrying the loop
      when we fail to get a refcount. In that case put_pi_state() should
      remove the entry from the list.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Gratian Crisan <gratian.crisan@ni.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: dvhart@infradead.org
      Cc: syzbot <bot+2af19c9e1ffe4d4ee1d16c56ae7580feaee75765@syzkaller.appspotmail.com>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: <stable@vger.kernel.org>
      Fixes: c74aef2d ("futex: Fix pi_state->owner serialization")
      Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdfhjs@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      153fbd12
  6. 25 9月, 2017 1 次提交
  7. 26 8月, 2017 1 次提交
    • J
      futex: Remove duplicated code and fix undefined behaviour · 30d6e0a4
      Jiri Slaby 提交于
      There is code duplicated over all architecture's headers for
      futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
      and comparison of the result.
      
      Remove this duplication and leave up to the arches only the needed
      assembly which is now in arch_futex_atomic_op_inuser.
      
      This effectively distributes the Will Deacon's arm64 fix for undefined
      behaviour reported by UBSAN to all architectures. The fix was done in
      commit 5f16a046 (arm64: futex: Fix undefined behaviour with
      FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
      
      And as suggested by Thomas, check for negative oparg too, because it was
      also reported to cause undefined behaviour report.
      
      Note that s390 removed access_ok check in d12a2970 ("s390/uaccess:
      remove pointless access_ok() checks") as access_ok there returns true.
      We introduce it back to the helper for the sake of simplicity (it gets
      optimized away anyway).
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
      Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Reviewed-by: NDarren Hart (VMware) <dvhart@infradead.org>
      Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
      Cc: linux-mips@linux-mips.org
      Cc: Rich Felker <dalias@libc.org>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: sparclinux@vger.kernel.org
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: linux-s390@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: linux-hexagon@vger.kernel.org
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: linux-snps-arc@lists.infradead.org
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-xtensa@linux-xtensa.org
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: openrisc@lists.librecores.org
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-parisc@vger.kernel.org
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-alpha@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
      30d6e0a4
  8. 10 8月, 2017 1 次提交
    • M
      futex: Remove unnecessary warning from get_futex_key · 48fb6f4d
      Mel Gorman 提交于
      Commit 65d8fc77 ("futex: Remove requirement for lock_page() in
      get_futex_key()") removed an unnecessary lock_page() with the
      side-effect that page->mapping needed to be treated very carefully.
      
      Two defensive warnings were added in case any assumption was missed and
      the first warning assumed a correct application would not alter a
      mapping backing a futex key.  Since merging, it has not triggered for
      any unexpected case but Mark Rutland reported the following bug
      triggering due to the first warning.
      
        kernel BUG at kernel/futex.c:679!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
        Hardware name: linux,dummy-virt (DT)
        task: ffff80001e271780 task.stack: ffff000010908000
        PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
      
      The fact that it's a bug instead of a warning was due to an unrelated
      arm64 problem, but the warning itself triggered because the underlying
      mapping changed.
      
      This is an application issue but from a kernel perspective it's a
      recoverable situation and the warning is unnecessary so this patch
      removes the warning.  The warning may potentially be triggered with the
      following test program from Mark although it may be necessary to adjust
      NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
      system.
      
          #include <linux/futex.h>
          #include <pthread.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/time.h>
          #include <unistd.h>
      
          #define NR_FUTEX_THREADS 16
          pthread_t threads[NR_FUTEX_THREADS];
      
          void *mem;
      
          #define MEM_PROT  (PROT_READ | PROT_WRITE)
          #define MEM_SIZE  65536
      
          static int futex_wrapper(int *uaddr, int op, int val,
                                   const struct timespec *timeout,
                                   int *uaddr2, int val3)
          {
              syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
          }
      
          void *poll_futex(void *unused)
          {
              for (;;) {
                  futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
              }
          }
      
          int main(int argc, char *argv[])
          {
              int i;
      
              mem = mmap(NULL, MEM_SIZE, MEM_PROT,
                     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
              printf("Mapping @ %p\n", mem);
      
              printf("Creating futex threads...\n");
      
              for (i = 0; i < NR_FUTEX_THREADS; i++)
                  pthread_create(&threads[i], NULL, poll_futex, NULL);
      
              printf("Flipping mapping...\n");
              for (;;) {
                  mmap(mem, MEM_SIZE, MEM_PROT,
                       MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
              }
      
              return 0;
          }
      Reported-and-tested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48fb6f4d
  9. 01 8月, 2017 1 次提交
  10. 01 7月, 2017 1 次提交
    • K
      randstruct: Mark various structs for randomization · 3859a271
      Kees Cook 提交于
      This marks many critical kernel structures for randomization. These are
      structures that have been targeted in the past in security exploits, or
      contain functions pointers, pointers to function pointer tables, lists,
      workqueues, ref-counters, credentials, permissions, or are otherwise
      sensitive. This initial list was extracted from Brad Spengler/PaX Team's
      code in the last public patch of grsecurity/PaX based on my understanding
      of the code. Changes or omissions from the original code are mine and
      don't reflect the original grsecurity/PaX code.
      
      Left out of this list is task_struct, which requires special handling
      and will be covered in a subsequent patch.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      3859a271
  11. 20 6月, 2017 1 次提交
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  12. 16 5月, 2017 1 次提交
    • M
      mutex, futex: adjust kernel-doc markups to generate ReST · 7b4ff1ad
      Mauro Carvalho Chehab 提交于
      There are a few issues on some kernel-doc markups that was
      causing troubles with kernel-doc output on ReST format:
      
      ./kernel/futex.c:492: WARNING: Inline emphasis start-string without end-string.
      ./kernel/futex.c:1264: WARNING: Block quote ends without a blank line; unexpected unindent.
      ./kernel/futex.c:1721: WARNING: Block quote ends without a blank line; unexpected unindent.
      ./kernel/futex.c:2338: WARNING: Block quote ends without a blank line; unexpected unindent.
      ./kernel/futex.c:2426: WARNING: Block quote ends without a blank line; unexpected unindent.
      ./kernel/futex.c:2899: WARNING: Block quote ends without a blank line; unexpected unindent.
      ./kernel/futex.c:2972: WARNING: Block quote ends without a blank line; unexpected unindent.
      
      Fix them.
      
      No functional changes.
      Acked-by: NDarren Hart (VMware) <dvhart@infradead.org>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      7b4ff1ad
  13. 15 4月, 2017 1 次提交
  14. 14 4月, 2017 2 次提交
  15. 04 4月, 2017 2 次提交
  16. 24 3月, 2017 12 次提交
    • P
      futex: Drop hb->lock before enqueueing on the rtmutex · 56222b21
      Peter Zijlstra 提交于
      When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
      chain code will (falsely) report a deadlock and BUG.
      
      The problem is that it hold hb->lock (now an rt_mutex) while doing
      task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
      interleaved just right with futex_unlock_pi() leads it to believe to see an
      AB-BA deadlock.
      
        Task1 (holds rt_mutex,	Task2 (does FUTEX_LOCK_PI)
               does FUTEX_UNLOCK_PI)
      
      				lock hb->lock
      				lock rt_mutex (as per start_proxy)
        lock hb->lock
      
      Which is a trivial AB-BA.
      
      It is not an actual deadlock, because it won't be holding hb->lock by the
      time it actually blocks on the rt_mutex, but the chainwalk code doesn't
      know that and it would be a nightmare to handle this gracefully.
      
      To avoid this problem, do the same as in futex_unlock_pi() and drop
      hb->lock after acquiring wait_lock. This still fully serializes against
      futex_unlock_pi(), since adding to the wait_list does the very same lock
      dance, and removing it holds both locks.
      
      Aside of solving the RT problem this makes the lock and unlock mechanism
      symetric and reduces the hb->lock held time.
      Reported-and-tested-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      56222b21
    • P
      futex: Futex_unlock_pi() determinism · bebe5b51
      Peter Zijlstra 提交于
      The problem with returning -EAGAIN when the waiter state mismatches is that
      it becomes very hard to proof a bounded execution time on the
      operation. And seeing that this is a RT operation, this is somewhat
      important.
      
      While in practise; given the previous patch; it will be very unlikely to
      ever really take more than one or two rounds, proving so becomes rather
      hard.
      
      However, now that modifying wait_list is done while holding both hb->lock
      and wait_lock, the scenario can be avoided entirely by acquiring wait_lock
      while still holding hb-lock. Doing a hand-over, without leaving a hole.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bebe5b51
    • P
      futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() · cfafcd11
      Peter Zijlstra 提交于
      By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list
      modifications are done under both hb->lock and wait_lock.
      
      This closes the obvious interleave pattern between futex_lock_pi() and
      futex_unlock_pi(), but not entirely so. See below:
      
      Before:
      
      futex_lock_pi()			futex_unlock_pi()
        unlock hb->lock
      
      				  lock hb->lock
      				  unlock hb->lock
      
      				  lock rt_mutex->wait_lock
      				  unlock rt_mutex_wait_lock
      				    -EAGAIN
      
        lock rt_mutex->wait_lock
        list_add
        unlock rt_mutex->wait_lock
      
        schedule()
      
        lock rt_mutex->wait_lock
        list_del
        unlock rt_mutex->wait_lock
      
      				  <idem>
      				    -EAGAIN
      
        lock hb->lock
      
      
      After:
      
      futex_lock_pi()			futex_unlock_pi()
      
        lock hb->lock
        lock rt_mutex->wait_lock
        list_add
        unlock rt_mutex->wait_lock
        unlock hb->lock
      
        schedule()
      				  lock hb->lock
      				  unlock hb->lock
        lock hb->lock
        lock rt_mutex->wait_lock
        list_del
        unlock rt_mutex->wait_lock
      
      				  lock rt_mutex->wait_lock
      				  unlock rt_mutex_wait_lock
      				    -EAGAIN
      
        unlock hb->lock
      
      
      It does however solve the earlier starvation/live-lock scenario which got
      introduced with the -EAGAIN since unlike the before scenario; where the
      -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the
      after scenario it happens while futex_unlock_pi() actually holds a lock,
      and then it is serialized on that lock.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cfafcd11
    • P
      futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() · 38d589f2
      Peter Zijlstra 提交于
      With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
      consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
      parts, such that only the actual blocking can be done without hb->lock
      held.
      
      Split split_mutex_finish_proxy_lock() into two parts, one that does the
      blocking and one that does remove_waiter() when the lock acquire failed.
      
      When the rtmutex was acquired successfully the waiter can be removed in the
      acquisiton path safely, since there is no concurrency on the lock owner.
      
      This means that, except for futex_lock_pi(), all wait_list modifications
      are done with both hb->lock and wait_lock held.
      
      [bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      38d589f2
    • P
      futex,rt_mutex: Introduce rt_mutex_init_waiter() · 50809358
      Peter Zijlstra 提交于
      Since there's already two copies of this code, introduce a helper now
      before adding a third one.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.950039479@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      50809358
    • P
      futex: Pull rt_mutex_futex_unlock() out from under hb->lock · 16ffa12d
      Peter Zijlstra 提交于
      There's a number of 'interesting' problems, all caused by holding
      hb->lock while doing the rt_mutex_unlock() equivalient.
      
      Notably:
      
       - a PI inversion on hb->lock; and,
      
       - a SCHED_DEADLINE crash because of pointer instability.
      
      The previous changes:
      
       - changed the locking rules to cover {uval,pi_state} with wait_lock.
      
       - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in
         turn allows to rely on wait_lock atomicity completely.
      
       - simplified the waiter conundrum.
      
      It's now sufficient to hold rtmutex::wait_lock and a reference on the
      pi_state to protect the state consistency, so hb->lock can be dropped
      before calling rt_mutex_futex_unlock().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      16ffa12d
    • P
      futex: Rework inconsistent rt_mutex/futex_q state · 73d786bd
      Peter Zijlstra 提交于
      There is a weird state in the futex_unlock_pi() path when it interleaves
      with a concurrent futex_lock_pi() at the point where it drops hb->lock.
      
      In this case, it can happen that the rt_mutex wait_list and the futex_q
      disagree on pending waiters, in particular rt_mutex will find no pending
      waiters where futex_q thinks there are. In this case the rt_mutex unlock
      code cannot assign an owner.
      
      The futex side fixup code has to cleanup the inconsistencies with quite a
      bunch of interesting corner cases.
      
      Simplify all this by changing wake_futex_pi() to return -EAGAIN when this
      situation occurs. This then gives the futex_lock_pi() code the opportunity
      to continue and the retried futex_unlock_pi() will now observe a coherent
      state.
      
      The only problem is that this breaks RT timeliness guarantees. That
      is, consider the following scenario:
      
        T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)
      
          CPU0
      
          T1
            lock_pi()
            queue_me()  <- Waiter is visible
      
          preemption
      
          T2
            unlock_pi()
      	loops with -EAGAIN forever
      
      Which is undesirable for PI primitives. Future patches will rectify
      this.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      73d786bd
    • P
      futex: Cleanup refcounting · bf92cf3a
      Peter Zijlstra 提交于
      Add a put_pit_state() as counterpart for get_pi_state() so the refcounting
      becomes consistent.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bf92cf3a
    • P
      futex: Change locking rules · 734009e9
      Peter Zijlstra 提交于
      Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
      creates another set of problems, especially priority inversions on RT where
      hb->lock becomes a rt_mutex itself.
      
      The rt_mutex::wait_lock is the most obvious protection for keeping the
      futex user space value and the kernel internal pi_state in sync.
      
      Rework and document the locking so rt_mutex::wait_lock is held accross all
      operations which modify the user space value and the pi state.
      
      This allows to invoke rt_mutex_unlock() (including deboost) without holding
      hb->lock as a next step.
      
      Nothing yet relies on the new locking rules.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      734009e9
    • P
      futex,rt_mutex: Provide futex specific rt_mutex API · 5293c2ef
      Peter Zijlstra 提交于
      Part of what makes futex_unlock_pi() intricate is that
      rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop
      rt_mutex::wait_lock.
      
      This means it cannot rely on the atomicy of wait_lock, which would be
      preferred in order to not rely on hb->lock so much.
      
      The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can
      race with the rt_mutex fastpath, however futexes have their own fast path.
      
      Since futexes already have a bunch of separate rt_mutex accessors, complete
      that set and implement a rt_mutex variant without fastpath for them.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5293c2ef
    • P
      futex: Use smp_store_release() in mark_wake_futex() · 1b367ece
      Peter Zijlstra 提交于
      Since the futex_q can dissapear the instruction after assigning NULL,
      this really should be a RELEASE barrier. That stops loads from hitting
      dead memory too.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.604296452@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1b367ece
    • P
      futex: Cleanup variable names for futex_top_waiter() · 499f5aca
      Peter Zijlstra 提交于
      futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
      this to a variable 'match' totally obscures the code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.554710645@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      499f5aca
  17. 15 3月, 2017 2 次提交
  18. 02 3月, 2017 2 次提交
  19. 28 2月, 2017 1 次提交
  20. 13 2月, 2017 1 次提交
    • Y
      futex: Move futex_init() to core_initcall · 25f71d1c
      Yang Yang 提交于
      The UEVENT user mode helper is enabled before the initcalls are executed
      and is available when the root filesystem has been mounted.
      
      The user mode helper is triggered by device init calls and the executable
      might use the futex syscall.
      
      futex_init() is marked __initcall which maps to device_initcall, but there
      is no guarantee that futex_init() is invoked _before_ the first device init
      call which triggers the UEVENT user mode helper.
      
      If the user mode helper uses the futex syscall before futex_init() then the
      syscall crashes with a NULL pointer dereference because the futex subsystem
      has not been initialized yet.
      
      Move futex_init() to core_initcall so futexes are initialized before the
      root filesystem is mounted and the usermode helper becomes available.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: NYang Yang <yang.yang29@zte.com.cn>
      Cc: jiang.biao2@zte.com.cn
      Cc: jiang.zhengxiong@zte.com.cn
      Cc: zhong.weidong@zte.com.cn
      Cc: deng.huali@zte.com.cn
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cnSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      25f71d1c
  21. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  22. 21 11月, 2016 1 次提交
  23. 05 9月, 2016 1 次提交
  24. 30 7月, 2016 1 次提交