1. 27 11月, 2012 1 次提交
    • D
      futex: avoid wake_futex() for a PI futex_q · aa10990e
      Darren Hart 提交于
      Dave Jones reported a bug with futex_lock_pi() that his trinity test
      exposed.  Sometime between queue_me() and taking the q.lock_ptr, the
      lock_ptr became NULL, resulting in a crash.
      
      While futex_wake() is careful to not call wake_futex() on futex_q's with
      a pi_state or an rt_waiter (which are either waiting for a
      futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
      futex_requeue() do not perform the same test.
      
      Update futex_wake_op() and futex_requeue() to test for q.pi_state and
      q.rt_waiter and abort with -EINVAL if detected.  To ensure any future
      breakage is caught, add a WARN() to wake_futex() if the same condition
      is true.
      
      This fix has seen 3 hours of testing with "trinity -c futex" on an
      x86_64 VM with 4 CPUS.
      
      [akpm@linux-foundation.org: tidy up the WARN()]
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Reported-by: NDave Jones <davej@redat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa10990e
  2. 01 11月, 2012 1 次提交
    • T
      futex: Handle futex_pi OWNER_DIED take over correctly · 59fa6245
      Thomas Gleixner 提交于
      Siddhesh analyzed a failure in the take over of pi futexes in case the
      owner died and provided a workaround.
      See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076
      
      The detailed problem analysis shows:
      
      Futex F is initialized with PTHREAD_PRIO_INHERIT and
      PTHREAD_MUTEX_ROBUST_NP attributes.
      
      T1 lock_futex_pi(F);
      
      T2 lock_futex_pi(F);
         --> T2 blocks on the futex and creates pi_state which is associated
             to T1.
      
      T1 exits
         --> exit_robust_list() runs
             --> Futex F userspace value TID field is set to 0 and
                 FUTEX_OWNER_DIED bit is set.
      
      T3 lock_futex_pi(F);
         --> Succeeds due to the check for F's userspace TID field == 0
         --> Claims ownership of the futex and sets its own TID into the
             userspace TID field of futex F
         --> returns to user space
      
      T1 --> exit_pi_state_list()
             --> Transfers pi_state to waiter T2 and wakes T2 via
             	   rt_mutex_unlock(&pi_state->mutex)
      
      T2 --> acquires pi_state->mutex and gains real ownership of the
             pi_state
         --> Claims ownership of the futex and sets its own TID into the
             userspace TID field of futex F
         --> returns to user space
      
      T3 --> observes inconsistent state
      
      This problem is independent of UP/SMP, preemptible/non preemptible
      kernels, or process shared vs. private. The only difference is that
      certain configurations are more likely to expose it.
      
      So as Siddhesh correctly analyzed the following check in
      futex_lock_pi_atomic() is the culprit:
      
      	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
      
      We check the userspace value for a TID value of 0 and take over the
      futex unconditionally if that's true.
      
      AFAICT this check is there as it is correct for a different corner
      case of futexes: the WAITERS bit became stale.
      
      Now the proposed change
      
      -	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
      +       if (unlikely(ownerdied ||
      +                       !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {
      
      solves the problem, but it's not obvious why and it wreckages the
      "stale WAITERS bit" case.
      
      What happens is, that due to the WAITERS bit being set (T2 is blocked
      on that futex) it enforces T3 to go through lookup_pi_state(), which
      in the above case returns an existing pi_state and therefor forces T3
      to legitimately fight with T2 over the ownership of the pi_state (via
      pi_state->mutex). Probelm solved!
      
      Though that does not work for the "WAITERS bit is stale" problem
      because if lookup_pi_state() does not find existing pi_state it
      returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
      return -ESRCH to user space because the OWNER_DIED bit is not set.
      
      Now there is a different solution to that problem. Do not look at the
      user space value at all and enforce a lookup of possibly available
      pi_state. If pi_state can be found, then the new incoming locker T3
      blocks on that pi_state and legitimately races with T2 to acquire the
      rt_mutex and the pi_state and therefor the proper ownership of the
      user space futex.
      
      lookup_pi_state() has the correct order of checks. It first tries to
      find a pi_state associated with the user space futex and only if that
      fails it checks for futex TID value = 0. If no pi_state is available
      nothing can create new state at that point because this happens with
      the hash bucket lock held.
      
      So the above scenario changes to:
      
      T1 lock_futex_pi(F);
      
      T2 lock_futex_pi(F);
         --> T2 blocks on the futex and creates pi_state which is associated
             to T1.
      
      T1 exits
         --> exit_robust_list() runs
             --> Futex F userspace value TID field is set to 0 and
                 FUTEX_OWNER_DIED bit is set.
      
      T3 lock_futex_pi(F);
         --> Finds pi_state and blocks on pi_state->rt_mutex
      
      T1 --> exit_pi_state_list()
             --> Transfers pi_state to waiter T2 and wakes it via
             	   rt_mutex_unlock(&pi_state->mutex)
      
      T2 --> acquires pi_state->mutex and gains ownership of the pi_state
         --> Claims ownership of the futex and sets its own TID into the
             userspace TID field of futex F
         --> returns to user space
      
      This covers all gazillion points on which T3 might come in between
      T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
      also solves the "WAITERS bit stale" problem by forcing the take over.
      
      Another benefit of changing the code this way is that it makes it less
      dependent on untrusted user space values and therefor minimizes the
      possible wreckage which might be inflicted.
      
      As usual after staring for too long at the futex code my brain hurts
      so much that I really want to ditch that whole optimization of
      avoiding the syscall for the non contended case for PI futexes and rip
      out the maze of corner case handling code. Unfortunately we can't as
      user space relies on that existing behaviour, but at least thinking
      about it helps me to preserve my mental sanity. Maybe we should
      nevertheless :)
      Reported-and-tested-by: NSiddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionosAcked-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      59fa6245
  3. 24 7月, 2012 3 次提交
  4. 29 3月, 2012 2 次提交
  5. 15 2月, 2012 2 次提交
  6. 01 1月, 2012 1 次提交
    • H
      futex: Fix uninterruptible loop due to gate_area · e6780f72
      Hugh Dickins 提交于
      It was found (by Sasha) that if you use a futex located in the gate
      area we get stuck in an uninterruptible infinite loop, much like the
      ZERO_PAGE issue.
      
      While looking at this problem, PeterZ realized you'll get into similar
      trouble when hitting any install_special_pages() mapping.  And are there
      still drivers setting up their own special mmaps without page->mapping,
      and without special VM or pte flags to make get_user_pages fail?
      
      In most cases, if page->mapping is NULL, we do not need to retry at all:
      Linus points out that even /proc/sys/vm/drop_caches poses no problem,
      because it ends up using remove_mapping(), which takes care not to
      interfere when the page reference count is raised.
      
      But there is still one case which does need a retry: if memory pressure
      called shmem_writepage in between get_user_pages_fast dropping page
      table lock and our acquiring page lock, then the page gets switched from
      filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
      Fault it back in to get the page->mapping needed for key->shared.inode.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6780f72
  7. 31 10月, 2011 1 次提交
  8. 15 9月, 2011 2 次提交
  9. 27 7月, 2011 1 次提交
    • S
      futex: Fix regression with read only mappings · 9ea71503
      Shawn Bohrer 提交于
      commit 7485d0d3 (futexes: Remove rw
      parameter from get_futex_key()) in 2.6.33 fixed two problems:  First, It
      prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW
      MAP_PRIVATE futex operations by forcing the COW to occur by
      unconditionally performing a write access get_user_pages_fast() to get
      the page.  The commit also introduced a user-mode regression in that it
      broke futex operations on read-only memory maps.  For example, this
      breaks workloads that have one or more reader processes doing a
      FUTEX_WAIT on a futex within a read only shared file mapping, and a
      writer processes that has a writable mapping issuing the FUTEX_WAKE.
      
      This fixes the regression for valid futex operations on RO mappings by
      trying a RO get_user_pages_fast() when the RW get_user_pages_fast()
      fails. This change makes it necessary to also check for invalid use
      cases, such as anonymous RO mappings (which can never change) and the
      ZERO_PAGE which the commit referenced above was written to address.
      
      This patch does restore the original behavior with RO MAP_PRIVATE
      mappings, which have inherent user-mode usage problems and don't really
      make sense.  With this patch performing a FUTEX_WAIT within a RO
      MAP_PRIVATE mapping will be successfully woken provided another process
      updates the region of the underlying mapped file.  However, the mmap()
      man page states that for a MAP_PRIVATE mapping:
      
        It is unspecified whether changes made to the file after
        the mmap() call are visible in the mapped region.
      
      So user-mode users attempting to use futex operations on RO MAP_PRIVATE
      mappings are depending on unspecified behavior.  Additionally a
      RO MAP_PRIVATE mapping could fail to wake up in the following case.
      
        Thread-A: call futex(FUTEX_WAIT, memory-region-A).
                  get_futex_key() return inode based key.
                  sleep on the key
        Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
        Thread-B: write memory-region-A.
                  COW happen. This process's memory-region-A become related
                  to new COWed private (ie PageAnon=1) page.
        Thread-B: call futex(FUETX_WAKE, memory-region-A).
                  get_futex_key() return mm based key.
                  IOW, we fail to wake up Thread-A.
      
      Once again doing something like this is just silly and users who do
      something like this get what they deserve.
      
      While RO MAP_PRIVATE mappings are nonsensical, checking for a private
      mapping requires walking the vmas and was deemed too costly to avoid a
      userspace hang.
      
      This Patch is based on Peter Zijlstra's initial patch with modifications to
      only allow RO mappings for futex operations that need VERIFY_READ access.
      Reported-by: NDavid Oliver <david@rgmadvisors.com>
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: peterz@infradead.org
      Cc: eric.dumazet@gmail.com
      Cc: zvonler@rgmadvisors.com
      Cc: hughd@google.com
      Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9ea71503
  10. 26 7月, 2011 1 次提交
    • B
      mm/futex: fix futex writes on archs with SW tracking of dirty & young · 2efaca92
      Benjamin Herrenschmidt 提交于
      I haven't reproduced it myself but the fail scenario is that on such
      machines (notably ARM and some embedded powerpc), if you manage to hit
      that futex path on a writable page whose dirty bit has gone from the PTE,
      you'll livelock inside the kernel from what I can tell.
      
      It will go in a loop of trying the atomic access, failing, trying gup to
      "fix it up", getting succcess from gup, go back to the atomic access,
      failing again because dirty wasn't fixed etc...
      
      So I think you essentially hang in the kernel.
      
      The scenario is probably rare'ish because affected architecture are
      embedded and tend to not swap much (if at all) so we probably rarely hit
      the case where dirty is missing or young is missing, but I think Shan has
      a piece of SW that can reliably reproduce it using a shared writable
      mapping & fork or something like that.
      
      On archs who use SW tracking of dirty & young, a page without dirty is
      effectively mapped read-only and a page without young unaccessible in the
      PTE.
      
      Additionally, some architectures might lazily flush the TLB when relaxing
      write protection (by doing only a local flush), and expect a fault to
      invalidate the stale entry if it's still present on another processor.
      
      The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
      "fix it up" by causing get_user_pages() which would then be equivalent to
      taking the fault.
      
      However that isn't the case.  get_user_pages() will not call
      handle_mm_fault() in the case where the PTE seems to have the right
      permissions, regardless of the dirty and young state.  It will eventually
      update those bits ...  in the struct page, but not in the PTE.
      
      Additionally, it will not handle the lazy TLB flushing that can be
      required by some architectures in the fault case.
      
      Basically, gup is the wrong interface for the job.  The patch provides a
      more appropriate one which boils down to just calling handle_mm_fault()
      since what we are trying to do is simulate a real page fault.
      
      The futex code currently attempts to write to user memory within a
      pagefault disabled section, and if that fails, tries to fix it up using
      get_user_pages().
      
      This doesn't work on archs where the dirty and young bits are maintained
      by software, since they will gate access permission in the TLB, and will
      not be updated by gup().
      
      In addition, there's an expectation on some archs that a spurious write
      fault triggers a local TLB flush, and that is missing from the picture as
      well.
      
      I decided that adding those "features" to gup() would be too much for this
      already too complex function, and instead added a new simpler
      fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
      which the futex code can call.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-by: NShan Hai <haishan.bai@gmail.com>
      Tested-by: NShan Hai <haishan.bai@gmail.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Darren Hart <darren.hart@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2efaca92
  11. 08 7月, 2011 1 次提交
  12. 15 4月, 2011 1 次提交
  13. 25 3月, 2011 1 次提交
  14. 24 3月, 2011 1 次提交
  15. 15 3月, 2011 1 次提交
    • T
      futex: Deobfuscate handle_futex_death() · 6e0aa9f8
      Thomas Gleixner 提交于
      handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without
      disabling page faults. That's ok, but totally non obvious.
      
      We don't hold locks so we actually can and want to fault here, because
      the get_user() before futex_atomic_cmpxchg_inatomic() does not
      guarantee a R/W mapping.
      
      We could just add a big fat comment to explain this, but actually
      changing the code so that the functionality is entirely clear is
      better.
      
      Use the helper function which disables page faults around the
      futex_atomic_cmpxchg_inatomic() and handle a fault with a call to
      fault_in_user_writeable() as all other places in the futex code do as
      well.
      Pointed-out-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDarren Hart <darren@dvhart.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6e0aa9f8
  16. 12 3月, 2011 2 次提交
  17. 11 3月, 2011 3 次提交
    • M
      futex: Sanitize cmpxchg_futex_value_locked API · 37a9d912
      Michel Lespinasse 提交于
      The cmpxchg_futex_value_locked API was funny in that it returned either
      the original, user-exposed futex value OR an error code such as -EFAULT.
      This was confusing at best, and could be a source of livelocks in places
      that retry the cmpxchg_futex_value_locked after trying to fix the issue
      by running fault_in_user_writeable().
          
      This change makes the cmpxchg_futex_value_locked API more similar to the
      get_futex_value_locked one, returning an error code and updating the
      original value through a reference argument.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [tile]
      Acked-by: Tony Luck <tony.luck@intel.com>  [ia64]
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: Michal Simek <monstr@monstr.eu>  [microblaze]
      Acked-by: David Howells <dhowells@redhat.com> [frv]
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110311024851.GC26122@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      37a9d912
    • T
      futex: Avoid redudant evaluation of task_pid_vnr() · c0c9ed15
      Thomas Gleixner 提交于
      The result is not going to change under us, so no need to reevaluate
      this over and over. Seems to be a leftover from the mechanical mass
      conversion of task->pid to task_pid_vnr(tsk).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c0c9ed15
    • M
      futex: Update futex_wait_setup comments about locking · 8fe8f545
      Michel Lespinasse 提交于
      Reviving a cleanup I had done about a year ago as part of a larger
      futex_set_wait proposal. Over the years, the locking of the hashed
      futex queue got improved, so that some of the "rare but normal" race
      conditions described in comments can't actually happen anymore.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20110307020750.GA31188@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      8fe8f545
  18. 28 1月, 2011 1 次提交
    • L
      rtmutex: Simplify PI algorithm and make highest prio task get lock · 8161239a
      Lai Jiangshan 提交于
      In current rtmutex, the pending owner may be boosted by the tasks
      in the rtmutex's waitlist when the pending owner is deboosted
      or a task in the waitlist is boosted. This boosting is unrelated,
      because the pending owner does not really take the rtmutex.
      It is not reasonable.
      
      Example.
      
      time1:
      A(high prio) onwers the rtmutex.
      B(mid prio) and C (low prio) in the waitlist.
      
      time2
      A release the lock, B becomes the pending owner
      A(or other high prio task) continues to run. B's prio is lower
      than A, so B is just queued at the runqueue.
      
      time3
      A or other high prio task sleeps, but we have passed some time
      The B and C's prio are changed in the period (time2 ~ time3)
      due to boosting or deboosting. Now C has the priority higher
      than B. ***Is it reasonable that C has to boost B and help B to
      get the rtmutex?
      
      NO!! I think, it is unrelated/unneed boosting before B really
      owns the rtmutex. We should give C a chance to beat B and
      win the rtmutex.
      
      This is the motivation of this patch. This patch *ensures*
      only the top waiter or higher priority task can take the lock.
      
      How?
      1) we don't dequeue the top waiter when unlock, if the top waiter
         is changed, the old top waiter will fail and go to sleep again.
      2) when requiring lock, it will get the lock when the lock is not taken and:
         there is no waiter OR higher priority than waiters OR it is top waiter.
      3) In any time, the top waiter is changed, the top waiter will be woken up.
      
      The algorithm is much simpler than before, no pending owner, no
      boosting for pending owner.
      
      Other advantage of this patch:
      1) The states of a rtmutex are reduced a half, easier to read the code.
      2) the codes become shorter.
      3) top waiter is not dequeued until it really take the lock:
         they will retain FIFO when it is stolen.
      
      Not advantage nor disadvantage
      1) Even we may wakeup multiple waiters(any time when top waiter changed),
         we hardly cause "thundering herd",
         the number of wokenup task is likely 1 or very little.
      2) two APIs are changed.
         rt_mutex_owner() will not return pending owner, it will return NULL when
                          the top waiter is going to take the lock.
         rt_mutex_next_owner() always return the top waiter.
      	                 will not return NULL if we have waiters
                               because the top waiter is not dequeued.
      
         I have fixed the code that use these APIs.
      
      need updated after this patch is accepted
      1) Document/*
      2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D3012D5.4060709@cn.fujitsu.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8161239a
  19. 14 1月, 2011 1 次提交
    • A
      thp: update futex compound knowledge · a5b338f2
      Andrea Arcangeli 提交于
      Futex code is smarter than most other gup_fast O_DIRECT code and knows
      about the compound internals.  However now doing a put_page(head_page)
      will not release the pin on the tail page taken by gup-fast, leading to
      all sort of refcounting bugchecks.  Getting a stable head_page is a little
      tricky.
      
      page_head = page is there because if this is not a tail page it's also the
      page_head.  Only in case this is a tail page, compound_head is called,
      otherwise it's guaranteed unnecessary.  And if it's a tail page
      compound_head has to run atomically inside irq disabled section
      __get_user_pages_fast before returning.  Otherwise ->first_page won't be a
      stable pointer.
      
      Disableing irq before __get_user_page_fast and releasing irq after running
      compound_head is needed because if __get_user_page_fast returns == 1, it
      means the huge pmd is established and cannot go away from under us.
      pmdp_splitting_flush_notify in __split_huge_page_splitting will have to
      wait for local_irq_enable before the IPI delivery can return.  This means
      __split_huge_page_refcount can't be running from under us, and in turn
      when we run compound_head(page) we're not reading a dangling pointer from
      tailpage->first_page.  Then after we get to stable head page, we are
      always safe to call compound_lock and after taking the compound lock on
      head page we can finally re-check if the page returned by gup-fast is
      still a tail page.  in which case we're set and we didn't need to split
      the hugepage in order to take a futex on it.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5b338f2
  20. 11 1月, 2011 1 次提交
  21. 10 11月, 2010 4 次提交
    • D
      futex: Add futex_q static initializer · 5bdb05f9
      Darren Hart 提交于
      The futex_q struct has grown considerably over the last couple years. I
      believe it now merits a static initializer to avoid uninitialized data
      errors (having spent more time than I care to admit debugging an uninitialized
      q.bitset in an experimental new op code).
      
      With the key initializer built in, several of the FUTEX_KEY_INIT calls can
      be removed.
      
      V2: use a static variable instead of an init macro.
          use a C99 initializer and don't rely on variable ordering in the struct.
      V3: make futex_q_init const
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <1289252428-18383-1-git-send-email-dvhart@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5bdb05f9
    • D
      futex: Replace fshared and clockrt with combined flags · b41277dc
      Darren Hart 提交于
      In the early days we passed the mmap sem around. That became the
      "int fshared" with the fast gup improvements. Then we added
      "int clockrt" in places. This patch unifies these options as "flags".
      
      [ tglx: Split out the stale fshared cleanup ]
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <1289250609-16304-1-git-send-email-dvhart@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b41277dc
    • T
      futex: Cleanup stale fshared flag interfaces · ae791a2d
      Thomas Gleixner 提交于
      The fast GUP changes stopped using the fshared flag in
      put_futex_keys(), but we kept the interface the same.
      
      Cleanup all stale users.
      
      This patch is split out from Darren Harts combo patch which also
      combines various flags. This way the changes are clearly separated.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Darren Hart <dvhart@linux.intel.com>
      LKML-Reference: <1289250609-16304-1-git-send-email-dvhart@linux.intel.com>
      ae791a2d
    • D
      futex: Address compiler warnings in exit_robust_list · 4c115e95
      Darren Hart 提交于
      Since commit 1dcc41bb (futex: Change 3rd arg of fetch_robust_entry()
      to unsigned int*) some gcc versions decided to emit the following
      warning:
      
      kernel/futex.c: In function ‘exit_robust_list’:
      kernel/futex.c:2492: warning: ‘next_pi’ may be used uninitialized in this function
      
      The commit did not introduce the warning as gcc should have warned
      before that commit as well. It's just gcc being silly.
      
      The code path really can't result in next_pi being unitialized (or
      should not), but let's keep the build clean. Annotate next_pi as an
      uninitialized_var.
      
      [ tglx: Addressed the same issue in futex_compat.c and massaged the
        	changelog ]
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Tested-by: NMatt Fleming <matt@console-pimps.org>
      Tested-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <1288897200-13008-1-git-send-email-dvhart@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4c115e95
  22. 26 10月, 2010 1 次提交
  23. 19 10月, 2010 1 次提交
    • D
      futex: Fix errors in nested key ref-counting · 7ada876a
      Darren Hart 提交于
      futex_wait() is leaking key references due to futex_wait_setup()
      acquiring an additional reference via the queue_lock() routine. The
      nested key ref-counting has been masking bugs and complicating code
      analysis. queue_lock() is only called with a previously ref-counted
      key, so remove the additional ref-counting from the queue_(un)lock()
      functions.
      
      Also futex_wait_requeue_pi() drops one key reference too many in
      unqueue_me_pi(). Remove the key reference handling from
      unqueue_me_pi(). This was paired with a queue_lock() in
      futex_lock_pi(), so the count remains unchanged.
      
      Document remaining nested key ref-counting sites.
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Reported-and-tested-by: Matthieu Fertré<matthieu.fertre@kerlabs.com>
      Reported-by: Louis Rilling<louis.rilling@kerlabs.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4CBB17A8.70401@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      7ada876a
  24. 14 10月, 2010 1 次提交
  25. 18 9月, 2010 3 次提交
  26. 01 7月, 2010 1 次提交
    • M
      futex: futex_find_get_task remove credentails check · 7a0ea09a
      Michal Hocko 提交于
      futex_find_get_task is currently used (through lookup_pi_state) from two
      contexts, futex_requeue and futex_lock_pi_atomic.  None of the paths
      looks it needs the credentials check, though.  Different (e)uids
      shouldn't matter at all because the only thing that is important for
      shared futex is the accessibility of the shared memory.
      
      The credentail check results in glibc assert failure or process hang (if
      glibc is compiled without assert support) for shared robust pthread
      mutex with priority inheritance if a process tries to lock already held
      lock owned by a process with a different euid:
      
      pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
      
      The problem is that futex_lock_pi_atomic which is called when we try to
      lock already held lock checks the current holder (tid is stored in the
      futex value) to get the PI state.  It uses lookup_pi_state which in turn
      gets task struct from futex_find_get_task.  ESRCH is returned either
      when the task is not found or if credentials check fails.
      
      futex_lock_pi_atomic simply returns if it gets ESRCH.  glibc code,
      however, doesn't expect that robust lock returns with ESRCH because it
      should get either success or owner died.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a0ea09a
  27. 03 2月, 2010 1 次提交