1. 20 6月, 2017 6 次提交
    • I
      sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> · 5dd43ce2
      Ingo Molnar 提交于
      The wait_bit*() types and APIs are mixed into wait.h, but they
      are a pretty orthogonal extension of wait-queues.
      
      Furthermore, only about 50 kernel files use these APIs, while
      over 1000 use the regular wait-queue functionality.
      
      So clean up the main wait.h by moving the wait-bit functionality
      out of it, into a separate .h and .c file:
      
        include/linux/wait_bit.h  for types and APIs
        kernel/sched/wait_bit.c   for the implementation
      
      Update all header dependencies.
      
      This reduces the size of wait.h rather significantly, by about 30%.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5dd43ce2
    • I
      sched/wait: Standardize wait_bit_queue naming · 76c85ddc
      Ingo Molnar 提交于
      So wait-bit-queue head variables are often named:
      
      	struct wait_bit_queue *q
      
      ... which is a bit ambiguous and super confusing, because
      they clearly suggest wait-queue head semantics and behavior
      (they rhyme with the old wait_queue_t *q naming), while they
      are extended wait-queue _entries_, not heads!
      
      They are misnomers in two ways:
      
       - the 'wait_bit_queue' leaves open the question of whether
         it's an entry or a head
      
       - the 'q' parameter and local variable naming falsely implies
         that it's a 'queue' - while it's an entry.
      
      This resulted in sometimes confusing cases such as:
      
      	finish_wait(wq, &q->wait);
      
      where the 'q' is not a wait-queue head, but a wait-bit-queue entry.
      
      So improve this all by standardizing wait-bit-queue nomenclature
      similar to wait-queue head naming:
      
      	struct wait_bit_queue   => struct wait_bit_queue_entry
      	q			=> wbq_entry
      
      Which makes it all a much clearer:
      
      	struct wait_bit_queue_entry *wbq_entry
      
      ... and turns the former confusing piece of code into:
      
      	finish_wait(wq_head, &wbq_entry->wq_entry;
      
      which IMHO makes it apparently clear what we are doing,
      without having to analyze the context of the code: we are
      adding a wait-queue entry to a regular wait-queue head,
      which entry is embedded in a wait-bit-queue entry.
      
      I'm not a big fan of acronyms, but repeating wait_bit_queue_entry
      in field and local variable names is too long, so Hopefully it's
      clear enough that 'wq_' prefixes stand for wait-queues, while
      'wbq_' prefixes stand for wait-bit-queues.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      76c85ddc
    • I
      sched/wait: Standardize 'struct wait_bit_queue' wait-queue entry field name · 21417136
      Ingo Molnar 提交于
      Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly
      name it as a wait-queue entry.
      
      Propagate it to a couple of usage sites where the wait-bit-queue internals
      are exposed.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21417136
    • I
      sched/wait: Standardize internal naming of wait-queue heads · 9d9d676f
      Ingo Molnar 提交于
      The wait-queue head parameters and variables are named in a
      couple of ways, we have the following variants currently:
      
      	wait_queue_head_t *q
      	wait_queue_head_t *wq
      	wait_queue_head_t *head
      
      In particular the 'wq' naming is ambiguous in the sense whether it's
      a wait-queue head or entry name - as entries were often named 'wait'.
      
      ( Not to mention the confusion of any readers coming over from
        workqueue-land. )
      
      Standardize all this around a single, unambiguous parameter and
      variable name:
      
      	struct wait_queue_head *wq_head
      
      which is easy to grep for and also rhymes nicely with the wait-queue
      entry naming:
      
      	struct wait_queue_entry *wq_entry
      
      Also rename:
      
      	struct __wait_queue_head => struct wait_queue_head
      
      ... and use this struct type to migrate from typedefs usage to 'struct'
      usage, which is more in line with existing kernel practices.
      
      Don't touch any external users and preserve the main wait_queue_head_t
      typedef.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d9d676f
    • I
      sched/wait: Standardize internal naming of wait-queue entries · 50816c48
      Ingo Molnar 提交于
      So the various wait-queue entry variables in include/linux/wait.h
      and kernel/sched/wait.c are named in a colorfully inconsistent
      way:
      
      	wait_queue_entry_t *wait
      	wait_queue_entry_t *__wait	(even in plain C code!)
      	wait_queue_entry_t *q		(!)
      	wait_queue_entry_t *new		(making anyone who knows C++ cringe)
      	wait_queue_entry_t *old
      
      I think part of the reason for the inconsistency is the constant
      apparent confusion about what a wait queue 'head' versus 'entry' is.
      
      ( Some of the documentation talks about a 'wait descriptor', which is
        the wait-queue entry itself - further adding to the confusion. )
      
      The most common name is 'wait', but that in itself is somewhat
      ambiguous as well, as it does not really make it clear whether
      it's a wait-queue entry or head.
      
      To improve all this name the wait-queue entry structure parameters
      and variables consistently and push through this naming into all
      the wait.h and wait.c code:
      
      	struct wait_queue_entry *wq_entry
      
      The 'wq_' prefix makes it easy to grep for, and we also use the
      opportunity to move away from the typedef to a plain 'struct' naming:
      in the kernel we typically reserve typedefs for cases where a
      C structure is really small and somewhat opaque - such as pte_t.
      
      wait-queue entries are neither small nor opaque, so use the more
      standard 'struct xxx_entry' list management code nomenclature instead.
      
      ( We don't touch external users, and we preserve the typedef as well
        for actual wait-queue users, to reduce unnecessary churn. )
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      50816c48
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  2. 09 3月, 2017 1 次提交
    • L
      sched/headers: fix up header file dependency on <linux/sched/signal.h> · bd0f9b35
      Linus Torvalds 提交于
      The scheduler header file split and cleanups ended up exposing a few
      nasty header file dependencies, and in particular it showed how we in
      <linux/wait.h> ended up depending on "signal_pending()", which now comes
      from <linux/sched/signal.h>.
      
      That's a very subtle and annoying dependency, which already caused a
      semantic merge conflict (see commit e58bc927 "Pull overlayfs updates
      from Miklos Szeredi", which added that fixup in the merge commit).
      
      It turns out that we can avoid this dependency _and_ improve code
      generation by moving the guts of the fairly nasty helper #define
      __wait_event_interruptible_locked() to out-of-line code.  The code that
      includes the signal_pending() check is all in the slow-path where we
      actually go to sleep waiting for the event anyway, so using a helper
      function is the right thing to do.
      
      Using a helper function is also what we already did for the non-locked
      versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
      set of helper functions.
      
      We might want to try to unify all these macro games, we have a _lot_ of
      subtly different wait-event loops.  But this is the minimal patch to fix
      the annoying header dependency.
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd0f9b35
  3. 02 3月, 2017 2 次提交
  4. 28 10月, 2016 1 次提交
    • L
      mm: remove per-zone hashtable of bitlock waitqueues · 9dcb8b68
      Linus Torvalds 提交于
      The per-zone waitqueues exist because of a scalability issue with the
      page waitqueues on some NUMA machines, but it turns out that they hurt
      normal loads, and now with the vmalloced stacks they also end up
      breaking gfs2 that uses a bit_wait on a stack object:
      
           wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
      
      where 'gh' can be a reference to the local variable 'mount_gh' on the
      stack of fill_super().
      
      The reason the per-zone hash table breaks for this case is that there is
      no "zone" for virtual allocations, and trying to look up the physical
      page to get at it will fail (with a BUG_ON()).
      
      It turns out that I actually complained to the mm people about the
      per-zone hash table for another reason just a month ago: the zone lookup
      also hurts the regular use of "unlock_page()" a lot, because the zone
      lookup ends up forcing several unnecessary cache misses and generates
      horrible code.
      
      As part of that earlier discussion, we had a much better solution for
      the NUMA scalability issue - by just making the page lock have a
      separate contention bit, the waitqueue doesn't even have to be looked at
      for the normal case.
      
      Peter Zijlstra already has a patch for that, but let's see if anybody
      even notices.  In the meantime, let's fix the actual gfs2 breakage by
      simplifying the bitlock waitqueues and removing the per-zone issue.
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9dcb8b68
  5. 30 9月, 2016 4 次提交
  6. 14 12月, 2015 1 次提交
  7. 04 12月, 2015 1 次提交
  8. 23 9月, 2015 1 次提交
  9. 05 9月, 2015 1 次提交
  10. 19 5月, 2015 1 次提交
  11. 08 5月, 2015 1 次提交
  12. 04 11月, 2014 1 次提交
    • P
      sched/wait: Fix a kthread race with wait_woken() · cb6538e7
      Peter Zijlstra 提交于
      There is a race between kthread_stop() and the new wait_woken() that
      can result in a lack of progress.
      
      CPU 0                                    | CPU 1
                                               |
      rfcomm_run()                             | kthread_stop()
        ...                                    |
        if (!test_bit(KTHREAD_SHOULD_STOP))    |
                                               |   set_bit(KTHREAD_SHOULD_STOP)
                                               |   wake_up_process()
          wait_woken()                         |   wait_for_completion()
            set_current_state(INTERRUPTIBLE)   |
            if (!WQ_FLAG_WOKEN)                |
              schedule_timeout()               |
                                               |
      
      After which both tasks will wait.. forever.
      
      Fix this by having wait_woken() check for kthread_should_stop() but
      only for kthreads (obviously).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cb6538e7
  13. 28 10月, 2014 1 次提交
  14. 25 9月, 2014 1 次提交
    • N
      SCHED: add some "wait..on_bit...timeout()" interfaces. · cbbce822
      NeilBrown 提交于
      In commit c1221321
         sched: Allow wait_on_bit_action() functions to support a timeout
      
      I suggested that a "wait_on_bit_timeout()" interface would not meet my
      need.  This isn't true - I was just over-engineering.
      
      Including a 'private' field in wait_bit_key instead of a focused
      "timeout" field was just premature generalization.  If some other
      use is ever found, it can be generalized or added later.
      
      So this patch renames "private" to "timeout" with a meaning "stop
      waiting when "jiffies" reaches or passes "timeout",
      and adds two of the many possible wait..bit..timeout() interfaces:
      
      wait_on_page_bit_killable_timeout(), which is the one I want to use,
      and out_of_line_wait_on_bit_timeout() which is a reasonably general
      example.  Others can be added as needed.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      cbbce822
  15. 16 7月, 2014 2 次提交
    • N
      sched: Allow wait_on_bit_action() functions to support a timeout · c1221321
      NeilBrown 提交于
      It is currently not possible for various wait_on_bit functions
      to implement a timeout.
      
      While the "action" function that is called to do the waiting
      could certainly use schedule_timeout(), there is no way to carry
      forward the remaining timeout after a false wake-up.
      As false-wakeups a clearly possible at least due to possible
      hash collisions in bit_waitqueue(), this is a real problem.
      
      The 'action' function is currently passed a pointer to the word
      containing the bit being waited on.  No current action functions
      use this pointer.  So changing it to something else will be a
      little noisy but will have no immediate effect.
      
      This patch changes the 'action' function to take a pointer to
      the "struct wait_bit_key", which contains a pointer to the word
      containing the bit so nothing is really lost.
      
      It also adds a 'private' field to "struct wait_bit_key", which
      is initialized to zero.
      
      An action function can now implement a timeout with something
      like
      
      static int timed_out_waiter(struct wait_bit_key *key)
      {
      	unsigned long waited;
      	if (key->private == 0) {
      		key->private = jiffies;
      		if (key->private == 0)
      			key->private -= 1;
      	}
      	waited = jiffies - key->private;
      	if (waited > 10 * HZ)
      		return -EAGAIN;
      	schedule_timeout(waited - 10 * HZ);
      	return 0;
      }
      
      If any other need for context in a waiter were found it would be
      easy to use ->private for some other purpose, or even extend
      "struct wait_bit_key".
      
      My particular need is to support timeouts in nfs_release_page()
      to avoid deadlocks with loopback mounted NFS.
      
      While wait_on_bit_timeout() would be a cleaner interface, it
      will not meet my need.  I need the timeout to be sensitive to
      the state of the connection with the server, which could change.
       So I need to use an 'action' interface.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1221321
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  16. 18 4月, 2014 1 次提交
  17. 06 11月, 2013 2 次提交
  18. 16 10月, 2013 1 次提交
    • O
      sched/wait: Introduce prepare_to_wait_event() · c2d81644
      Oleg Nesterov 提交于
      Add the new helper, prepare_to_wait_event() which should only be used
      by ___wait_event().
      
      prepare_to_wait_event() returns -ERESTARTSYS if signal_pending_state()
      is true, otherwise it does prepare_to_wait/exclusive.  This allows to
      uninline the signal-pending checks in wait_event*() macros.
      
      Also, it can initialize wait->private/func. We do not care if they were
      already initialized, the values are the same. This also shaves a couple
      of insns from the inlined code.
      
      This obviously makes prepare_*() path a little bit slower, but we are
      likely going to sleep anyway, so I think it makes sense to shrink .text:
      
                     text    data      bss      dec     hex  filename
                  ===================================================
         before:  5126092 2959248 10117120 18202460 115bf5c   vmlinux
          after:  5124618 2955152 10117120 18196890 115a99a   vmlinux
      
      on my build.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20131007161824.GA29757@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c2d81644
  19. 20 8月, 2013 1 次提交
  20. 24 7月, 2013 1 次提交
  21. 15 5月, 2013 1 次提交
    • D
      Add wait_on_atomic_t() and wake_up_atomic_t() · cb65537e
      David Howells 提交于
      Add wait_on_atomic_t() and wake_up_atomic_t() to indicate became-zero events on
      atomic_t types.  This uses the bit-wake waitqueue table.  The key is set to a
      value outside of the number of bits in a long so that wait_on_bit() won't be
      woken up accidentally.
      
      What I'm using this for is: in a following patch I add a counter to struct
      fscache_cookie to count the number of outstanding operations that need access
      to netfs data.  The way this works is:
      
       (1) When a cookie is allocated, the counter is initialised to 1.
      
       (2) When an operation wants to access netfs data, it calls atomic_inc_unless()
           to increment the counter before it does so.  If it was 0, then the counter
           isn't incremented, the operation isn't permitted to access the netfs data
           (which might by this point no longer exist) and the operation aborts in
           some appropriate manner.
      
       (3) When an operation finishes with the netfs data, it decrements the counter
           and if it reaches 0, calls wake_up_atomic_t() on it - the assumption being
           that it was the last blocker.
      
       (4) When a cookie is released, the counter is decremented and the releaser
           uses wait_on_atomic_t() to wait for the counter to become 0 - which should
           indicate no one is using the netfs data any longer.  The netfs data can
           then be destroyed.
      
      There are some alternatives that I have thought of and that have been suggested
      by Tejun Heo:
      
       (A) Using wait_on_bit() to wait on a bit in the counter.  This doesn't work
           because if that bit happens to be 0 then the wait won't happen - even if
           the counter is non-zero.
      
       (B) Using wait_on_bit() to wait on a flag elsewhere which is cleared when the
           counter reaches 0.  Such a flag would be redundant and would add
           complexity.
      
       (C) Adding a waitqueue to fscache_cookie - this would expand that struct by
           several words for an event that happens just once in each cookie's
           lifetime.  Further, cookies are generally per-file so there are likely to
           be a lot of them.
      
       (D) Similar to (C), but add a pointer to a waitqueue in the cookie instead of
           a waitqueue.  This would add single word per cookie and so would be less
           of an expansion - but still an expansion.
      
       (E) Adding a static waitqueue to the fscache module.  Generally this would be
           fine, but under certain circumstances many cookies will all get added at
           the same time (eg. NFS umount, cache withdrawal) thereby presenting
           scaling issues.  Note that the wait may be significant as disk I/O may be
           in progress.
      
      So, I think reusing the wait_on_bit() waitqueue set is reasonable.  I don't
      make much use of the waitqueue I need on a per-cookie basis, but sometimes I
      have a huge flood of the cookies to deal with.
      
      I also don't want to add a whole new set of global waitqueue tables
      specifically for the dec-to-0 event if I can reuse the bit tables.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      cb65537e
  22. 06 12月, 2012 1 次提交
  23. 21 12月, 2011 1 次提交
  24. 31 10月, 2011 1 次提交
  25. 31 3月, 2011 1 次提交
  26. 27 10月, 2010 1 次提交
  27. 10 8月, 2009 1 次提交
  28. 14 4月, 2009 1 次提交
    • J
      wait: don't use __wake_up_common() · 78ddb08f
      Johannes Weiner 提交于
      '777c6c5f wait: prevent exclusive waiter starvation' made
      __wake_up_common() global to be used from abort_exclusive_wait().
      
      It was needed to do a wake-up with the waitqueue lock held while
      passing down a key to the wake-up function.
      
      Since '4ede816a epoll keyed wakeups: add __wake_up_locked_key() and
      __wake_up_sync_key()' there is an appropriate wrapper for this case:
      __wake_up_locked_key().
      
      Use it here and make __wake_up_common() private to the scheduler
      again.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1239720785-19661-1-git-send-email-hannes@cmpxchg.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78ddb08f
  29. 06 2月, 2009 1 次提交
    • J
      wait: prevent exclusive waiter starvation · 777c6c5f
      Johannes Weiner 提交于
      With exclusive waiters, every process woken up through the wait queue must
      ensure that the next waiter down the line is woken when it has finished.
      
      Interruptible waiters don't do that when aborting due to a signal.  And if
      an aborting waiter is concurrently woken up through the waitqueue, noone
      will ever wake up the next waiter.
      
      This has been observed with __wait_on_bit_lock() used by
      lock_page_killable(): the first contender on the queue was aborting when
      the actual lock holder woke it up concurrently.  The aborted contender
      didn't acquire the lock and therefor never did an unlock followed by
      waking up the next waiter.
      
      Add abort_exclusive_wait() which removes the process' wait descriptor from
      the waitqueue, iff still queued, or wakes up the next waiter otherwise.
      It does so under the waitqueue lock.  Racing with a wake up means the
      aborting process is either already woken (removed from the queue) and will
      wake up the next waiter, or it will remove itself from the queue and the
      concurrent wake up will apply to the next waiter after it.
      
      Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
      __wait_on_bit_lock() when they were interrupted by other means than a wake
      up through the queue.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Reported-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Mentored-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chuck Lever <cel@citi.umich.edu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		["after some testing"]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      777c6c5f