1. 18 9月, 2010 3 次提交
  2. 01 7月, 2010 1 次提交
    • M
      futex: futex_find_get_task remove credentails check · 7a0ea09a
      Michal Hocko 提交于
      futex_find_get_task is currently used (through lookup_pi_state) from two
      contexts, futex_requeue and futex_lock_pi_atomic.  None of the paths
      looks it needs the credentials check, though.  Different (e)uids
      shouldn't matter at all because the only thing that is important for
      shared futex is the accessibility of the shared memory.
      
      The credentail check results in glibc assert failure or process hang (if
      glibc is compiled without assert support) for shared robust pthread
      mutex with priority inheritance if a process tries to lock already held
      lock owned by a process with a different euid:
      
      pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
      
      The problem is that futex_lock_pi_atomic which is called when we try to
      lock already held lock checks the current holder (tid is stored in the
      futex value) to get the PI state.  It uses lookup_pi_state which in turn
      gets task struct from futex_find_get_task.  ESRCH is returned either
      when the task is not found or if credentials check fails.
      
      futex_lock_pi_atomic simply returns if it gets ESRCH.  glibc code,
      however, doesn't expect that robust lock returns with ESRCH because it
      should get either success or owner died.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a0ea09a
  3. 03 2月, 2010 3 次提交
    • T
      futex: Handle futex value corruption gracefully · 59647b6a
      Thomas Gleixner 提交于
      The WARN_ON in lookup_pi_state which complains about a mismatch
      between pi_state->owner->pid and the pid which we retrieved from the
      user space futex is completely bogus.
      
      The code just emits the warning and then continues despite the fact
      that it detected an inconsistent state of the futex. A conveniant way
      for user space to spam the syslog.
      
      Replace the WARN_ON by a consistency check. If the values do not match
      return -EINVAL and let user space deal with the mess it created.
      
      This also fixes the missing task_pid_vnr() when we compare the
      pi_state->owner pid with the futex value.
      Reported-by: NJermome Marchand <jmarchan@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      59647b6a
    • T
      futex: Handle user space corruption gracefully · 51246bfd
      Thomas Gleixner 提交于
      If the owner of a PI futex dies we fix up the pi_state and set
      pi_state->owner to NULL. When a malicious or just sloppy programmed
      user space application sets the futex value to 0 e.g. by calling
      pthread_mutex_init(), then the futex can be acquired again. A new
      waiter manages to enqueue itself on the pi_state w/o damage, but on
      unlock the kernel dereferences pi_state->owner and oopses.
      
      Prevent this by checking pi_state->owner in the unlock path. If
      pi_state->owner is not current we know that user space manipulated the
      futex value. Ignore the mess and return -EINVAL.
      
      This catches the above case and also the case where a task hijacks the
      futex by setting the tid value and then tries to unlock it.
      Reported-by: NJermome Marchand <jmarchan@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      51246bfd
    • M
      futex_lock_pi() key refcnt fix · 5ecb01cf
      Mikael Pettersson 提交于
      This fixes a futex key reference count bug in futex_lock_pi(),
      where a key's reference count is incremented twice but decremented
      only once, causing the backing object to not be released.
      
      If the futex is created in a temporary file in an ext3 file system,
      this bug causes the file's inode to become an "undead" orphan,
      which causes an oops from a BUG_ON() in ext3_put_super() when the
      file system is unmounted. glibc's test suite is known to trigger this,
      see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>.
      
      The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's
      38d47c1b "[PATCH] futex: rely on
      get_user_pages() for shared futexes". That commit made get_futex_key()
      also increment the reference count of the futex key, and updated its
      callers to decrement the key's reference count before returning.
      Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
      the reference count is incremented by get_futex_key() and queue_lock(),
      but the normal exit path only decrements once, via unqueue_me_pi().
      The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
      this is easily done by 'goto out_put_key' rather than 'goto out'.
      Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@kernel.org>
      5ecb01cf
  4. 13 1月, 2010 1 次提交
    • K
      futexes: Remove rw parameter from get_futex_key() · 7485d0d3
      KOSAKI Motohiro 提交于
      Currently, futexes have two problem:
      
      A) The current futex code doesn't handle private file mappings properly.
      
      get_futex_key() uses PageAnon() to distinguish file and
      anon, which can cause the following bad scenario:
      
        1) thread-A call futex(private-mapping, FUTEX_WAIT), it
           sleeps on file mapping object.
        2) thread-B writes a variable and it makes it cow.
        3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
           wakes up blocked thread on the anonymous page. (but it's nothing)
      
      B) Current futex code doesn't handle zero page properly.
      
      Read mode get_user_pages() can return zero page, but current
      futex code doesn't handle it at all. Then, zero page makes
      infinite loop internally.
      
      The solution is to use write mode get_user_page() always for
      page lookup. It prevents the lookup of both file page of private
      mappings and zero page.
      
      Performance concerns:
      
      Probaly very little, because glibc always initialize variables
      for futex before to call futex(). It means glibc users never see
      the overhead of this patch.
      
      Compatibility concerns:
      
      This patch has few compatibility issues. After this patch,
      FUTEX_WAIT require writable access to futex variables (read-only
      mappings makes EFAULT). But practically it's not a problem,
      glibc always initalizes variables for futexes explicitly - nobody
      uses read-only mappings.
      Reported-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Ulrich Drepper <drepper@gmail.com>
      LKML-Reference: <20100105162633.45A2.A69D9226@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7485d0d3
  5. 15 12月, 2009 3 次提交
  6. 08 12月, 2009 1 次提交
  7. 29 10月, 2009 1 次提交
    • T
      futex: Fix spurious wakeup for requeue_pi really · 11df6ddd
      Thomas Gleixner 提交于
      The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
      NULL test) nor does it use the wake_list of futex_wake() which where
      the reason for commit 41890f2 (futex: Handle spurious wake up)
      
      See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
      
      The changes in this fix to the wait_requeue_pi path were considered to
      be a likely unecessary, but harmless safety net. But it turns out that
      due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
      as EAGAIN we built an endless loop in the code path which returns
      correctly EWOULDBLOCK.
      
      Spurious wakeups in wait_requeue_pi code path are unlikely so we do
      the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
      it deal with the spurious wakeup.
      
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      LKML-Reference: <4AE23C74.1090502@us.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      11df6ddd
  8. 16 10月, 2009 1 次提交
    • D
      futex: Move drop_futex_key_refs out of spinlock'ed region · 89061d3d
      Darren Hart 提交于
      When requeuing tasks from one futex to another, the reference held
      by the requeued task to the original futex location needs to be
      dropped eventually.
      
      Dropping the reference may ultimately lead to a call to
      "iput_final" and subsequently call into filesystem- specific code -
      which may be non-atomic.
      
      It is therefore safer to defer this drop operation until after the
      futex_hash_bucket spinlock has been dropped.
      
      Originally-From: Helge Bahmann <hcb@chaoticmind.net>
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <4AD7A298.5040802@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89061d3d
  9. 15 10月, 2009 1 次提交
    • D
      futex: Check for NULL keys in match_futex · 2bc87203
      Darren Hart 提交于
      If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
      it will find the futex_q->requeue_pi_key to be NULL and OOPS.
      
      Check for NULL in match_futex() instead of doing explicit NULL pointer
      checks on all call sites.  While match_futex(NULL, NULL) returning
      false is a little odd, it's still correct as we expect valid key
      references.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Dinakar Guniguntala <dino@in.ibm.com>
      CC: John Stultz <johnstul@us.ibm.com>
      Cc: stable@kernel.org
      LKML-Reference: <4AD60687.10306@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2bc87203
  10. 14 10月, 2009 1 次提交
    • T
      futex: Handle spurious wake up · d58e6576
      Thomas Gleixner 提交于
      The futex code does not handle spurious wake up in futex_wait and
      futex_wait_requeue_pi.
      
      The code assumes that any wake up which was not caused by futex_wake /
      requeue or by a timeout was caused by a signal wake up and returns one
      of the syscall restart error codes.
      
      In case of a spurious wake up the signal delivery code which deals
      with the restart error codes is not invoked and we return that error
      code to user space. That causes applications which actually check the
      return codes to fail. Blaise reported that on preempt-rt a python test
      program run into a exception trap. -rt exposed that due to a built in
      spurious wake up accelerator :)
      
      Solve this by checking signal_pending(current) in the wake up path and
      handle the spurious wake up case w/o returning to user space.
      Reported-by: NBlaise Gassend <blaise@willowgarage.com>
      Debugged-by: NDarren Hart <dvhltc@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@kernel.org
      LKML-Reference: <new-submission>
      d58e6576
  11. 08 10月, 2009 1 次提交
    • D
      futex: fix requeue_pi key imbalance · da085681
      Darren Hart 提交于
      If futex_wait_requeue_pi() wakes prior to requeue, we drop the
      reference to the source futex_key twice, once in
      handle_early_requeue_pi_wakeup() and once on our way out.
      
      Remove the drop from the handle_early_requeue_pi_wakeup() and keep
      the get/drops together in futex_wait_requeue_pi().
      Reported-by: NHelge Bahmann <hcb@chaoticmind.net>
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Helge Bahmann <hcb@chaoticmind.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: stable-2.6.31 <stable@kernel.org>
      LKML-Reference: <4ACCE21E.5030805@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da085681
  12. 06 10月, 2009 1 次提交
  13. 25 9月, 2009 1 次提交
  14. 22 9月, 2009 5 次提交
    • D
      futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me() · 0729e196
      Darren Hart 提交于
      PI futexes do not use the same plist_node_empty() test for wakeup.
      It was possible for the waiter (in futex_wait_requeue_pi()) to set
      TASK_INTERRUPTIBLE after the waker assigned the rtmutex to the
      waiter. The waiter would then note the plist was not empty and call
      schedule(). The task would not be found by any subsequeuent futex
      wakeups, resulting in a userspace hang.
      
      By moving the setting of TASK_INTERRUPTIBLE to before the call to
      queue_me(), the race with the waker is eliminated. Since we no
      longer call get_user() from within queue_me(), there is no need to
      delay the setting of TASK_INTERRUPTIBLE until after the call to
      queue_me().
      
      The FUTEX_LOCK_PI operation is not affected as futex_lock_pi()
      relies entirely on the rtmutex code to handle schedule() and
      wakeup.  The requeue PI code is affected because the waiter starts
      as a non-PI waiter and is woken on a PI futex.
      
      Remove the crusty old comment about holding spinlocks() across
      get_user() as we no longer do that. Correct the locking statement
      with a description of why the test is performed.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090922053038.8717.97838.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0729e196
    • D
      futex: Correct futex_q woken state commentary · d8d88fbb
      Darren Hart 提交于
      Use kernel-doc format to describe struct futex_q.
      
      Correct the wakeup definition to eliminate the statement about
      waking the waiter between the plist_del() and the q->lock_ptr = 0.
      
      Note in the comment that PI futexes have a different definition of
      the woken state.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090922053029.8717.62798.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d8d88fbb
    • D
      futex: Make function kernel-doc commentary consistent · d96ee56c
      Darren Hart 提交于
      Make the existing function kernel-doc consistent throughout
      futex.c, following Documentation/kernel-doc-nano-howto.txt as
      closely as possible.
      
      When unsure, at least be consistent within futex.c.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090922053022.8717.13339.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d96ee56c
    • D
      futex: Correct queue_me and unqueue_me commentary · d40d65c8
      Darren Hart 提交于
      The queue_me/unqueue_me commentary is oddly placed and out of date.
      Clean it up and correct the inaccurate bits.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090922053015.8717.71713.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d40d65c8
    • D
      futex: Correct futex_wait_requeue_pi() commentary · 56ec1607
      Darren Hart 提交于
      Correct various typos and formatting inconsistencies in the
      commentary of futex_wait_requeue_pi().
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090922052958.8717.21932.stgit@Aeon>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56ec1607
  15. 16 8月, 2009 1 次提交
    • D
      futex: Detect mismatched requeue targets · 84bc4af5
      Darren Hart 提交于
      There is currently no check to ensure that userspace uses the same
      futex requeue target (uaddr2) in futex_requeue() that the waiter used
      in futex_wait_requeue_pi().  A mismatch here could very unexpected
      results as the waiter assumes it either wakes on uaddr1 or uaddr2. We
      could detect this on wakeup in the waiter, but the cleanup is more
      intense after the improper requeue has occured.
      
      This patch stores the waiter's expected requeue target in a new
      requeue_pi_key pointer in the futex_q which futex_requeue() checks
      prior to attempting to do a proxy lock acquistion or a requeue when
      requeue_pi=1. If they don't match, return -EINVAL from futex_requeue,
      aborting the requeue of any remaining waiters.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090814003650.14634.63916.stgit@Aeon>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      84bc4af5
  16. 11 8月, 2009 1 次提交
    • D
      futex: Fix handling of bad requeue syscall pairing · 392741e0
      Darren Hart 提交于
      If futex_requeue(requeue_pi=1) finds a futex_q that was created by a call
      other the futex_wait_requeue_pi(), the q.rt_waiter may be null.  If so,
      this will result in an oops from the following call graph:
      
      futex_requeue()
        rt_mutex_start_proxy_lock()
          task_blocks_on_rt_mutex()
            waiter->task dereference
              OOPS
      
      We currently WARN_ON() if this is detected, clearly this is inadequate.
      If we detect a mispairing in futex_requeue(), bail out, seding -EINVAL to
      user-space.
      
      V2: Fix parenthesis warnings.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4A7CA8C0.7010809@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      392741e0
  17. 10 8月, 2009 1 次提交
    • D
      futex: Update futex_q lock_ptr on requeue proxy lock · beda2c7e
      Darren Hart 提交于
      futex_requeue() can acquire the lock on behalf of a waiter
      early on or during the requeue loop if it is uncontended or in
      the event of a lock steal or owner died. On wakeup, the waiter
      (in futex_wait_requeue_pi()) cleans up the pi_state owner using
      the lock_ptr to protect against concurrent access to the
      pi_state. The pi_state is hung off futex_q's on the requeue
      target futex hash bucket so the lock_ptr needs to be updated
      accordingly.
      
      The problem manifested by triggering the WARN_ON in
      lookup_pi_state() about the pid != pi_state->owner->pid.  With
      this patch, the pi_state is properly guarded against concurrent
      access via the requeue target hb lock.
      
      The astute reviewer may notice that there is a window of time
      between when futex_requeue() unlocks the hb locks and when
      futex_wait_requeue_pi() will acquire hb2->lock.  During this
      time the pi_state and uval are not in sync with the underlying
      rtmutex owner (but the uval does indicate there are waiters, so
      no atomic changes will occur in userspace).  However, this is
      not a problem. Should a contending thread enter
      lookup_pi_state() and acquire hb2->lock before the ownership is
      fixed up, it will find the pi_state hung off a waiter's
      (possibly the pending owner's) futex_q and block on the
      rtmutex.  Once futex_wait_requeue_pi() fixes up the owner, it
      will also move the pi_state from the old owner's
      task->pi_state_list to its own.
      
      v3: Fix plist lock name for application to mainline (rather
          than -rt) Compile tested against tip/v2.6.31-rc5.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4A7F4EFF.6090903@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      beda2c7e
  18. 04 8月, 2009 1 次提交
  19. 11 7月, 2009 1 次提交
    • S
      futexes: Fix infinite loop in get_futex_key() on huge page · ce2ae53b
      Sonny Rao 提交于
      get_futex_key() can infinitely loop if it is called on a
      virtual address that is within a huge page but not aligned to
      the beginning of that page.  The call to get_user_pages_fast
      will return the struct page for a sub-page within the huge page
      and the check for page->mapping will always fail.
      
      The fix is to call compound_head on the page before checking
      that it's mapped.
      Signed-off-by: NSonny Rao <sonnyrao@us.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      Cc: anton@samba.org
      Cc: rajamony@us.ibm.com
      Cc: speight@us.ibm.com
      Cc: mstephen@us.ibm.com
      Cc: grimm@us.ibm.com
      Cc: mikey@ozlabs.au.ibm.com
      LKML-Reference: <20090710231313.GA23572@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce2ae53b
  20. 25 6月, 2009 2 次提交
    • T
      futex: request only one page from get_user_pages() · aa715284
      Thomas Gleixner 提交于
      Yanmin noticed that fault_in_user_writeable() requests 4 pages instead
      of one.
      
      That's the result of blindly trusting Linus' proposal :) I even looked
      up the prototype to verify the correctness: the argument in question
      is confusingly enough named "len" while in reality it means number of
      pages.
      Pointed-out-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      aa715284
    • T
      futex: Fix the write access fault problem for real · d0725992
      Thomas Gleixner 提交于
      commit 64d1304a (futex: setup writeable mapping for futex ops which
      modify user space data) did address only half of the problem of write
      access faults.
      
      The patch was made on two wrong assumptions:
      
      1) access_ok(VERIFY_WRITE,...) would actually check write access.
      
         On x86 it does _NOT_. It's a pure address range check.
      
      2) a RW mapped region can not go away under us.
      
         That's wrong as well. Nobody can prevent another thread to call
         mprotect(PROT_READ) on that region where the futex resides. If that
         call hits between the get_user_pages_fast() verification and the
         actual write access in the atomic region we are toast again.
      
      The solution is to not rely on access_ok and get_user() for any write
      access related fault on private and shared futexes. Instead we need to
      fault it in with verification of write access.
      
      There is no generic non destructive write mechanism which would fault
      the user page in trough a #PF, but as we already know that we will
      fault we can as well call get_user_pages() directly and avoid the #PF
      overhead.
      
      If get_user_pages() returns -EFAULT we know that we can not fix it
      anymore and need to bail out to user space.
      
      Remove a bunch of confusing comments on this issue as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      d0725992
  21. 20 5月, 2009 4 次提交
    • T
      futex: fix restart in wait_requeue_pi · 2070887f
      Thomas Gleixner 提交于
      If the waiter has been requeued to the outer PI futex and is
      interrupted by a signal and the thread handles the signal then
      ERESTART_RESTARTBLOCK is changed to EINTR and the restart block is
      discarded. That way we return an unexcpected EINTR to user space
      instead of ending up in futex_lock_pi_restart.
      
      But we do not need to restart the syscall because we know that the
      condition has changed since we have been requeued. If we would simply
      restart the syscall then we would drop out via the comparison of the
      user space value with EWOULDBLOCK.
      
      The user space side needs to handle EWOULDBLOCK anyway as the
      enqueueing on the inner futex can race with a requeue/wake. So we can
      simply return EWOULDBLOCK to user space which also signals that we did
      not take the outer futex and let user space handle it in the same way
      it has to handle the requeue/wake race.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2070887f
    • T
      futex: fix restart for early wakeup in futex_wait_requeue_pi() · 1c840c14
      Thomas Gleixner 提交于
      The futex_wait_requeue_pi op should restart unconditionally like
      futex_lock_pi. The user of that function e.g. pthread_cond_wait can
      not be interrupted so we do not care about the SA_RESTART flag of the
      signal. Clean up the FIXMEs.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1c840c14
    • T
      futex: cleanup error exit · c8b15a70
      Thomas Gleixner 提交于
      Reuse the put_key_ref(key2) call in the exit path.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c8b15a70
    • T
      futex: setup writeable mapping for futex ops which modify user space data · 64d1304a
      Thomas Gleixner 提交于
      The futex code installs a read only mapping via get_user_pages_fast()
      even if the futex op function has to modify user space data. The
      eventual fault was fixed up by futex_handle_fault() which walked the
      VMA with mmap_sem held.
      
      After the cleanup patches which removed the mmap_sem dependency of the
      futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
      clean up fault logic) removed the private VMA walk logic from the
      futex code. This change results in a stale RO mapping which is not
      fixed up.
      
      Instead of reintroducing the previous fault logic we set up the
      mapping in get_user_pages_fast() read/write for all operations which
      modify user space data. Also handle private futexes in the same way
      and make the current unconditional access_ok(VERIFY_WRITE) depend on
      the futex op.
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      CC: stable@kernel.org
      64d1304a
  22. 15 5月, 2009 1 次提交
    • T
      futex: remove the wait queue · f1a11e05
      Thomas Gleixner 提交于
      The waitqueue which is used in struct futex_q is a leftover from the
      futexfd implementation. There is no need to use a waitqueue at all, as
      the waiting task is the only user of it. The waitqueue just adds
      additional locking and a loop in the wake up path which both can be
      avoided.
      
      We have already a task reference in struct futex_q which is used for
      PI futexes. Use it for normal futexes as well and just wake up the
      task directly.
      
      The logic of signalling the futex wakeup via setting q->lock_ptr to
      NULL is kept with the difference that we set it NULL before doing the
      wakeup. This opens an exit race window vs. a non futex wake up of the
      to be woken up task, which we prevent with get_task_struct /
      put_task_struct on the waiter.
      
      [ Impact: simplification ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f1a11e05
  23. 30 4月, 2009 1 次提交
    • D
      futex: remove FUTEX_REQUEUE_PI (non CMP) · ba9c22f2
      Darren Hart 提交于
      The new requeue PI futex op codes were modeled after the existing
      FUTEX_REQUEUE and FUTEX_CMP_REQUEUE calls.  I was unaware at the time
      that FUTEX_REQUEUE was only around for compatibility reasons and
      shouldn't be used in new code.  Ulrich Drepper elaborates on this in his
      Futexes are Tricky paper: http://people.redhat.com/drepper/futex.pdf.
      The deprecated call doesn't catch changes to the futex corresponding to
      the destination futex which can lead to deadlock.
      
      Therefor, I feel it best to remove FUTEX_REQUEUE_PI and leave only
      FUTEX_CMP_REQUEUE_PI as there are not yet any existing users of the API.
      This patch does change the OP code value of FUTEX_CMP_REQUEUE_PI to 12
      from 13.  Since my test case is the only known user of this API, I felt
      this was the right thing to do, rather than leave a hole in the
      enumeration.
      
      I chose to continue using the _CMP_ modifier in the OP code to make it
      explicit to the user that the test is being done.
      
      Builds, boots, and ran several hundred iterations requeue_pi.c.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      LKML-Reference: <49ED580E.1050502@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ba9c22f2
  24. 11 4月, 2009 1 次提交
  25. 08 4月, 2009 1 次提交
  26. 06 4月, 2009 1 次提交
    • D
      futex: add requeue_pi functionality · 52400ba9
      Darren Hart 提交于
      PI Futexes and their underlying rt_mutex cannot be left ownerless if
      there are pending waiters as this will break the PI boosting logic, so
      the standard requeue commands aren't sufficient.  The new commands
      properly manage pi futex ownership by ensuring a futex with waiters
      has an owner at all times.  This will allow glibc to properly handle
      pi mutexes with pthread_condvars.
      
      The approach taken here is to create two new futex op codes:
      
      FUTEX_WAIT_REQUEUE_PI:
      Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
      and wake after they have been requeued to a pi futex.  Prior to returning to
      userspace, they will acquire this pi futex (and the underlying rt_mutex).
      
      futex_wait_requeue_pi() is the result of a high speed collision between
      futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being
      done by futex_proxy_trylock_atomic() on behalf of the top_waiter).
      
      FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI):
      This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
      regardless of how many tasks the caller intends to wake or requeue.
      pthread_cond_broadcast() should call this with nr_wake=1 and
      nr_requeue=INT_MAX.  pthread_cond_signal() should call this with nr_wake=1 and
      nr_requeue=0.  The reason being we need both callers to get the benefit of the
      futex_proxy_trylock_atomic() routine.  futex_requeue() also enqueues the
      top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      52400ba9