1. 22 9月, 2009 1 次提交
  2. 16 8月, 2009 1 次提交
    • D
      futex: Detect mismatched requeue targets · 84bc4af5
      Darren Hart 提交于
      There is currently no check to ensure that userspace uses the same
      futex requeue target (uaddr2) in futex_requeue() that the waiter used
      in futex_wait_requeue_pi().  A mismatch here could very unexpected
      results as the waiter assumes it either wakes on uaddr1 or uaddr2. We
      could detect this on wakeup in the waiter, but the cleanup is more
      intense after the improper requeue has occured.
      
      This patch stores the waiter's expected requeue target in a new
      requeue_pi_key pointer in the futex_q which futex_requeue() checks
      prior to attempting to do a proxy lock acquistion or a requeue when
      requeue_pi=1. If they don't match, return -EINVAL from futex_requeue,
      aborting the requeue of any remaining waiters.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090814003650.14634.63916.stgit@Aeon>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      84bc4af5
  3. 11 8月, 2009 1 次提交
    • D
      futex: Fix handling of bad requeue syscall pairing · 392741e0
      Darren Hart 提交于
      If futex_requeue(requeue_pi=1) finds a futex_q that was created by a call
      other the futex_wait_requeue_pi(), the q.rt_waiter may be null.  If so,
      this will result in an oops from the following call graph:
      
      futex_requeue()
        rt_mutex_start_proxy_lock()
          task_blocks_on_rt_mutex()
            waiter->task dereference
              OOPS
      
      We currently WARN_ON() if this is detected, clearly this is inadequate.
      If we detect a mispairing in futex_requeue(), bail out, seding -EINVAL to
      user-space.
      
      V2: Fix parenthesis warnings.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4A7CA8C0.7010809@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      392741e0
  4. 10 8月, 2009 1 次提交
    • D
      futex: Update futex_q lock_ptr on requeue proxy lock · beda2c7e
      Darren Hart 提交于
      futex_requeue() can acquire the lock on behalf of a waiter
      early on or during the requeue loop if it is uncontended or in
      the event of a lock steal or owner died. On wakeup, the waiter
      (in futex_wait_requeue_pi()) cleans up the pi_state owner using
      the lock_ptr to protect against concurrent access to the
      pi_state. The pi_state is hung off futex_q's on the requeue
      target futex hash bucket so the lock_ptr needs to be updated
      accordingly.
      
      The problem manifested by triggering the WARN_ON in
      lookup_pi_state() about the pid != pi_state->owner->pid.  With
      this patch, the pi_state is properly guarded against concurrent
      access via the requeue target hb lock.
      
      The astute reviewer may notice that there is a window of time
      between when futex_requeue() unlocks the hb locks and when
      futex_wait_requeue_pi() will acquire hb2->lock.  During this
      time the pi_state and uval are not in sync with the underlying
      rtmutex owner (but the uval does indicate there are waiters, so
      no atomic changes will occur in userspace).  However, this is
      not a problem. Should a contending thread enter
      lookup_pi_state() and acquire hb2->lock before the ownership is
      fixed up, it will find the pi_state hung off a waiter's
      (possibly the pending owner's) futex_q and block on the
      rtmutex.  Once futex_wait_requeue_pi() fixes up the owner, it
      will also move the pi_state from the old owner's
      task->pi_state_list to its own.
      
      v3: Fix plist lock name for application to mainline (rather
          than -rt) Compile tested against tip/v2.6.31-rc5.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4A7F4EFF.6090903@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      beda2c7e
  5. 04 8月, 2009 1 次提交
  6. 11 7月, 2009 1 次提交
    • S
      futexes: Fix infinite loop in get_futex_key() on huge page · ce2ae53b
      Sonny Rao 提交于
      get_futex_key() can infinitely loop if it is called on a
      virtual address that is within a huge page but not aligned to
      the beginning of that page.  The call to get_user_pages_fast
      will return the struct page for a sub-page within the huge page
      and the check for page->mapping will always fail.
      
      The fix is to call compound_head on the page before checking
      that it's mapped.
      Signed-off-by: NSonny Rao <sonnyrao@us.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      Cc: anton@samba.org
      Cc: rajamony@us.ibm.com
      Cc: speight@us.ibm.com
      Cc: mstephen@us.ibm.com
      Cc: grimm@us.ibm.com
      Cc: mikey@ozlabs.au.ibm.com
      LKML-Reference: <20090710231313.GA23572@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce2ae53b
  7. 25 6月, 2009 2 次提交
    • T
      futex: request only one page from get_user_pages() · aa715284
      Thomas Gleixner 提交于
      Yanmin noticed that fault_in_user_writeable() requests 4 pages instead
      of one.
      
      That's the result of blindly trusting Linus' proposal :) I even looked
      up the prototype to verify the correctness: the argument in question
      is confusingly enough named "len" while in reality it means number of
      pages.
      Pointed-out-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      aa715284
    • T
      futex: Fix the write access fault problem for real · d0725992
      Thomas Gleixner 提交于
      commit 64d1304a (futex: setup writeable mapping for futex ops which
      modify user space data) did address only half of the problem of write
      access faults.
      
      The patch was made on two wrong assumptions:
      
      1) access_ok(VERIFY_WRITE,...) would actually check write access.
      
         On x86 it does _NOT_. It's a pure address range check.
      
      2) a RW mapped region can not go away under us.
      
         That's wrong as well. Nobody can prevent another thread to call
         mprotect(PROT_READ) on that region where the futex resides. If that
         call hits between the get_user_pages_fast() verification and the
         actual write access in the atomic region we are toast again.
      
      The solution is to not rely on access_ok and get_user() for any write
      access related fault on private and shared futexes. Instead we need to
      fault it in with verification of write access.
      
      There is no generic non destructive write mechanism which would fault
      the user page in trough a #PF, but as we already know that we will
      fault we can as well call get_user_pages() directly and avoid the #PF
      overhead.
      
      If get_user_pages() returns -EFAULT we know that we can not fix it
      anymore and need to bail out to user space.
      
      Remove a bunch of confusing comments on this issue as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      d0725992
  8. 20 5月, 2009 4 次提交
    • T
      futex: fix restart in wait_requeue_pi · 2070887f
      Thomas Gleixner 提交于
      If the waiter has been requeued to the outer PI futex and is
      interrupted by a signal and the thread handles the signal then
      ERESTART_RESTARTBLOCK is changed to EINTR and the restart block is
      discarded. That way we return an unexcpected EINTR to user space
      instead of ending up in futex_lock_pi_restart.
      
      But we do not need to restart the syscall because we know that the
      condition has changed since we have been requeued. If we would simply
      restart the syscall then we would drop out via the comparison of the
      user space value with EWOULDBLOCK.
      
      The user space side needs to handle EWOULDBLOCK anyway as the
      enqueueing on the inner futex can race with a requeue/wake. So we can
      simply return EWOULDBLOCK to user space which also signals that we did
      not take the outer futex and let user space handle it in the same way
      it has to handle the requeue/wake race.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2070887f
    • T
      futex: fix restart for early wakeup in futex_wait_requeue_pi() · 1c840c14
      Thomas Gleixner 提交于
      The futex_wait_requeue_pi op should restart unconditionally like
      futex_lock_pi. The user of that function e.g. pthread_cond_wait can
      not be interrupted so we do not care about the SA_RESTART flag of the
      signal. Clean up the FIXMEs.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1c840c14
    • T
      futex: cleanup error exit · c8b15a70
      Thomas Gleixner 提交于
      Reuse the put_key_ref(key2) call in the exit path.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c8b15a70
    • T
      futex: setup writeable mapping for futex ops which modify user space data · 64d1304a
      Thomas Gleixner 提交于
      The futex code installs a read only mapping via get_user_pages_fast()
      even if the futex op function has to modify user space data. The
      eventual fault was fixed up by futex_handle_fault() which walked the
      VMA with mmap_sem held.
      
      After the cleanup patches which removed the mmap_sem dependency of the
      futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
      clean up fault logic) removed the private VMA walk logic from the
      futex code. This change results in a stale RO mapping which is not
      fixed up.
      
      Instead of reintroducing the previous fault logic we set up the
      mapping in get_user_pages_fast() read/write for all operations which
      modify user space data. Also handle private futexes in the same way
      and make the current unconditional access_ok(VERIFY_WRITE) depend on
      the futex op.
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      CC: stable@kernel.org
      64d1304a
  9. 15 5月, 2009 1 次提交
    • T
      futex: remove the wait queue · f1a11e05
      Thomas Gleixner 提交于
      The waitqueue which is used in struct futex_q is a leftover from the
      futexfd implementation. There is no need to use a waitqueue at all, as
      the waiting task is the only user of it. The waitqueue just adds
      additional locking and a loop in the wake up path which both can be
      avoided.
      
      We have already a task reference in struct futex_q which is used for
      PI futexes. Use it for normal futexes as well and just wake up the
      task directly.
      
      The logic of signalling the futex wakeup via setting q->lock_ptr to
      NULL is kept with the difference that we set it NULL before doing the
      wakeup. This opens an exit race window vs. a non futex wake up of the
      to be woken up task, which we prevent with get_task_struct /
      put_task_struct on the waiter.
      
      [ Impact: simplification ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f1a11e05
  10. 30 4月, 2009 1 次提交
    • D
      futex: remove FUTEX_REQUEUE_PI (non CMP) · ba9c22f2
      Darren Hart 提交于
      The new requeue PI futex op codes were modeled after the existing
      FUTEX_REQUEUE and FUTEX_CMP_REQUEUE calls.  I was unaware at the time
      that FUTEX_REQUEUE was only around for compatibility reasons and
      shouldn't be used in new code.  Ulrich Drepper elaborates on this in his
      Futexes are Tricky paper: http://people.redhat.com/drepper/futex.pdf.
      The deprecated call doesn't catch changes to the futex corresponding to
      the destination futex which can lead to deadlock.
      
      Therefor, I feel it best to remove FUTEX_REQUEUE_PI and leave only
      FUTEX_CMP_REQUEUE_PI as there are not yet any existing users of the API.
      This patch does change the OP code value of FUTEX_CMP_REQUEUE_PI to 12
      from 13.  Since my test case is the only known user of this API, I felt
      this was the right thing to do, rather than leave a hole in the
      enumeration.
      
      I chose to continue using the _CMP_ modifier in the OP code to make it
      explicit to the user that the test is being done.
      
      Builds, boots, and ran several hundred iterations requeue_pi.c.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      LKML-Reference: <49ED580E.1050502@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ba9c22f2
  11. 11 4月, 2009 1 次提交
  12. 08 4月, 2009 1 次提交
  13. 06 4月, 2009 8 次提交
  14. 03 4月, 2009 1 次提交
  15. 13 3月, 2009 2 次提交
  16. 12 3月, 2009 6 次提交
    • D
      futex: clean up fault logic · e4dc5b7a
      Darren Hart 提交于
      Impact: cleanup
      
      Older versions of the futex code held the mmap_sem which had to
      be dropped in order to call get_user(), so a two-pronged fault
      handling mechanism was employed to handle faults of the atomic
      operations.  The mmap_sem is no longer held, so get_user()
      should be adequate.  This patch greatly simplifies the logic and
      improves legibility.
      
      Build and boot tested on a 4 way Intel x86_64 workstation.
      Passes basic pthread_mutex and PI tests out of
      ltp/testcases/realtime.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <20090312075612.9856.48612.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4dc5b7a
    • D
      futex: unlock before returning -EFAULT · e8f6386c
      Darren Hart 提交于
      Impact: rt-mutex failure case fix
      
      futex_lock_pi can potentially return -EFAULT with the rt_mutex
      held.  This seems like the wrong thing to do as userspace should
      assume -EFAULT means the lock was not taken.  Even if it could
      figure this out, we'd be leaving the pi_state->owner in an
      inconsistent state.  This patch unlocks the rt_mutex prior to
      returning -EFAULT to userspace.
      
      Build and boot tested on a 4 way Intel x86_64 workstation.
      Passes basic pthread_mutex and PI tests out of
      ltp/testcases/realtime.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <20090312075606.9856.88729.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e8f6386c
    • D
      futex: use current->time_slack_ns for rt tasks too · 16f4993f
      Darren Hart 提交于
      RT tasks should set their timer slack to 0 on their own.  This
      patch removes the 'if (rt_task()) slack = 0;' block in
      futex_wait.
      
      Build and boot tested on a 4 way Intel x86_64 workstation.
      Passes basic pthread_mutex and PI tests out of
      ltp/testcases/realtime.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      LKML-Reference: <20090312075559.9856.28822.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      16f4993f
    • D
      futex: add double_unlock_hb() · 5eb3dc62
      Darren Hart 提交于
      Impact: cleanup
      
      The futex code uses double_lock_hb() which locks the hb->lock's
      in pointer value order.  There is no parallel unlock routine,
      and the code unlocks them in name order, ignoring pointer value.
      
      This patch adds double_unlock_hb() to refactor the duplicated
      code segments.
      
      Build and boot tested on a 4 way Intel x86_64 workstation.
      Passes basic pthread_mutex and PI tests out of
      ltp/testcases/realtime.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <20090312075552.9856.48021.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5eb3dc62
    • D
      futex: additional (get|put)_futex_key() fixes · de87fcc1
      Darren Hart 提交于
      Impact: fix races
      
      futex_requeue and futex_lock_pi still had some bad
      (get|put)_futex_key() usage. This patch adds the missing
      put_futex_keys() and corrects a goto in futex_lock_pi() to avoid
      a double get.
      
      Build and boot tested on a 4 way Intel x86_64 workstation.
      Passes basic pthread_mutex and PI tests out of
      ltp/testcases/realtime.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <20090312075545.9856.75152.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de87fcc1
    • D
      futex: update futex commentary · b2d0994b
      Darren Hart 提交于
      Impact: cleanup
      
      The futex_hash_bucket can be a bit confusing when first looking
      at the code as it is a shared queue (and futex_q isn't a queue
      at all, but rather an element on the queue).
      
      The mmap_sem is no longer held outside of the
      futex_handle_fault() routine, yet numerous comments refer to it.
      The fshared argument is no an integer.  I left some of these
      comments along as they are simply removed in future patches.
      
      Some of the commentary refering to futexes by virtual page
      mappings was not very clear, and completely accurate (as for
      shared futexes both the page and the offset are used to
      determine the key).  For the purposes of the function
      description, just referring to "the futex" seems sufficient.
      
      With hashed futexes we now access the page after the hash-bucket
      is locked, and not only after it is enqueued.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <20090312075537.9856.29954.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2d0994b
  17. 12 2月, 2009 1 次提交
  18. 14 1月, 2009 2 次提交
  19. 03 1月, 2009 1 次提交
  20. 30 12月, 2008 1 次提交
  21. 19 12月, 2008 1 次提交
    • D
      futex: clean up futex_(un)lock_pi fault handling · b5686363
      Darren Hart 提交于
      Impact: cleanup
      
      Some apparently left over cruft code was complicating the fault logic:
      
      Testing if uval != -EFAULT doesn't have any meaning, get_user() sets ret
      to either 0 or -EFAULT, there's no need to compare uval, especially not
      against EFAULT which it will never be.  This patch removes the superfluous
      test and clarifies the comment blocks.
      
      Build and boot tested on an 8way x86_64 system.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5686363
  22. 18 12月, 2008 1 次提交