1. 15 9月, 2017 2 次提交
    • T
      sched/wait: Introduce wakeup boomark in wake_up_page_bit · 11a19c7b
      Tim Chen 提交于
      Now that we have added breaks in the wait queue scan and allow bookmark
      on scan position, we put this logic in the wake_up_page_bit function.
      
      We can have very long page wait list in large system where multiple
      pages share the same wait list. We break the wake up walk here to allow
      other cpus a chance to access the list, and not to disable the interrupts
      when traversing the list for too long.  This reduces the interrupt and
      rescheduling latency, and excessive page wait queue lock hold time.
      
      [ v2: Remove bookmark_wake_function ]
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11a19c7b
    • T
      sched/wait: Break up long wake list walk · 2554db91
      Tim Chen 提交于
      We encountered workloads that have very long wake up list on large
      systems. A waker takes a long time to traverse the entire wake list and
      execute all the wake functions.
      
      We saw page wait list that are up to 3700+ entries long in tests of
      large 4 and 8 socket systems. It took 0.8 sec to traverse such list
      during wake up. Any other CPU that contends for the list spin lock will
      spin for a long time. It is a result of the numa balancing migration of
      hot pages that are shared by many threads.
      
      Multiple CPUs waking are queued up behind the lock, and the last one
      queued has to wait until all CPUs did all the wakeups.
      
      The page wait list is traversed with interrupt disabled, which caused
      various problems. This was the original cause that triggered the NMI
      watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
      extending the NMI watch dog timer there helped.
      
      This patch bookmarks the waker's scan position in wake list and break
      the wake up walk, to allow access to the list before the waker resume
      its walk down the rest of the wait list. It lowers the interrupt and
      rescheduling latency.
      
      This patch also provides a performance boost when combined with the next
      patch to break up page wakeup list walk. We saw 22% improvement in the
      will-it-scale file pread2 test on a Xeon Phi system running 256 threads.
      
      [ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
        simply access to flags. ]
      Reported-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2554db91
  2. 19 8月, 2017 1 次提交
    • L
      wait: add wait_event_killable_timeout() · 8ada9279
      Luis R. Rodriguez 提交于
      These are the few pending fixes I have queued up for v4.13-final.  One
      is a a generic regression fix for recursive loops on kmod and the other
      one is a trivial print out correction.
      
      During the v4.13 development we assumed that recursive kmod loops were
      no longer possible.  Clearly that is not true.  The regression fix makes
      use of a new killable wait.  We use a killable wait to be paranoid in
      how signals might be sent to modprobe and only accept a proper SIGKILL.
      The signal will only be available to userspace to issue *iff* a thread
      has already entered a wait state, and that happens only if we've already
      throttled after 50 kmod threads have been hit.
      
      Note that although it may seem excessive to trigger a failure afer 5
      seconds if all kmod thread remain busy, prior to the series of changes
      that went into v4.13 we would actually *always* fatally fail any request
      which came in if the limit was already reached.  The new waiting
      implemented in v4.13 actually gives us *more* breathing room -- the wait
      for 5 seconds is a wait for *any* kmod thread to finish.  We give up and
      fail *iff* no kmod thread has finished and they're *all* running
      straight for 5 consecutive seconds.  If 50 kmod threads are running
      consecutively for 5 seconds something else must be really bad.
      
      Recursive loops with kmod are bad but they're also hard to implement
      properly as a selftest without currently fooling current userspace tools
      like kmod [1].  For instance kmod will complain when you run depmod if
      it finds a recursive loop with symbol dependency between modules as such
      this type of recursive loop cannot go upstream as the modules_install
      target will fail after running depmod.
      
      These tests already exist on userspace kmod upstream though (refer to
      the testsuite/module-playground/mod-loop-*.c files).  The same is not
      true if request_module() is used though, or worst if aliases are used.
      
      Likewise the issue with 64-bit kernels booting 32-bit userspace without
      a binfmt handler built-in is also currently not detected and proactively
      avoided by userspace kmod tools, or kconfig for all architectures.
      Although we could complain in the kernel when some of these individual
      recursive issues creep up, proactively avoiding these situations in
      userspace at build time is what we should keep striving for.
      
      Lastly, since recursive loops could happen with kmod it may mean
      recursive loops may also be possible with other kernel usermode helpers,
      this should be investigated and long term if we can come up with a more
      sensible generic solution even better!
      
      [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
      [1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
      
      This patch (of 3):
      
      This wait is similar to wait_event_interruptible_timeout() but only
      accepts SIGKILL interrupt signal.  Other signals are ignored.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Matt Redfearn <matt.redfearn@imgetc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ada9279
  3. 25 7月, 2017 1 次提交
    • J
      sched/wait: Clean up some documentation warnings · 6c423f57
      Jonathan Corbet 提交于
      A couple of kerneldoc comments in <linux/wait.h> had incorrect names for
      macro parameters, with this unsightly result:
      
        ./include/linux/wait.h:555: warning: No description found for parameter 'wq'
        ./include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
        ./include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
        ./include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'
      
      Correct the comments and kill the warnings.
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170724135800.769c4042@lwn.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c423f57
  4. 20 6月, 2017 9 次提交
    • I
      sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming · 2055da97
      Ingo Molnar 提交于
      So I've noticed a number of instances where it was not obvious from the
      code whether ->task_list was for a wait-queue head or a wait-queue entry.
      
      Furthermore, there's a number of wait-queue users where the lists are
      not for 'tasks' but other entities (poll tables, etc.), in which case
      the 'task_list' name is actively confusing.
      
      To clear this all up, name the wait-queue head and entry list structure
      fields unambiguously:
      
      	struct wait_queue_head::task_list	=> ::head
      	struct wait_queue_entry::task_list	=> ::entry
      
      For example, this code:
      
      	rqw->wait.task_list.next != &wait->task_list
      
      ... is was pretty unclear (to me) what it's doing, while now it's written this way:
      
      	rqw->wait.head.next != &wait->entry
      
      ... which makes it pretty clear that we are iterating a list until we see the head.
      
      Other examples are:
      
      	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
      	list_for_each_entry(wq, &fence->wait.task_list, task_list) {
      
      ... where it's unclear (to me) what we are iterating, and during review it's
      hard to tell whether it's trying to walk a wait-queue entry (which would be
      a bug), while now it's written as:
      
      	list_for_each_entry_safe(pos, next, &x->head, entry) {
      	list_for_each_entry(wq, &fence->wait.head, entry) {
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2055da97
    • I
      sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> · 5dd43ce2
      Ingo Molnar 提交于
      The wait_bit*() types and APIs are mixed into wait.h, but they
      are a pretty orthogonal extension of wait-queues.
      
      Furthermore, only about 50 kernel files use these APIs, while
      over 1000 use the regular wait-queue functionality.
      
      So clean up the main wait.h by moving the wait-bit functionality
      out of it, into a separate .h and .c file:
      
        include/linux/wait_bit.h  for types and APIs
        kernel/sched/wait_bit.c   for the implementation
      
      Update all header dependencies.
      
      This reduces the size of wait.h rather significantly, by about 30%.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5dd43ce2
    • I
      sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> · 4b1c480b
      Ingo Molnar 提交于
      So there's over 300 CPP macro line-continuation backslashes in
      include/linux/wait.h (!!), which are aligned vertically to make
      the macro maze a bit more navigable.
      
      The recent renames and reorganization broke some of them, and
      instead of re-aligning them in every patch (which would add
      a lot of stylistic noise to the patches and make them less
      readable), I just ignored them - and fixed them up in a single
      go in this patch.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4b1c480b
    • I
      sched/wait: Improve the bit-wait API parameter names in the API function prototypes · 939798a0
      Ingo Molnar 提交于
      Contrary to kernel tradition, most of the bit-wait function prototypes
      in <linux/wait.h> don't fully define the parameter names, they only
      list the types:
      
      	int out_of_line_wait_on_bit_timeout(void *, int, wait_bit_action_f *, unsigned, unsigned long);
      
      ... which is pretty passive-aggressive in terms of informing the reader
      about what these functions are doing.
      
      Fill in the parameter names, such as:
      
      	int out_of_line_wait_on_bit_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout);
      
      Also turn spurious (and inconsistently utilized) cases of 'unsigned' into 'unsigned int'.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      939798a0
    • I
      sched/wait: Standardize wait_bit_queue naming · 76c85ddc
      Ingo Molnar 提交于
      So wait-bit-queue head variables are often named:
      
      	struct wait_bit_queue *q
      
      ... which is a bit ambiguous and super confusing, because
      they clearly suggest wait-queue head semantics and behavior
      (they rhyme with the old wait_queue_t *q naming), while they
      are extended wait-queue _entries_, not heads!
      
      They are misnomers in two ways:
      
       - the 'wait_bit_queue' leaves open the question of whether
         it's an entry or a head
      
       - the 'q' parameter and local variable naming falsely implies
         that it's a 'queue' - while it's an entry.
      
      This resulted in sometimes confusing cases such as:
      
      	finish_wait(wq, &q->wait);
      
      where the 'q' is not a wait-queue head, but a wait-bit-queue entry.
      
      So improve this all by standardizing wait-bit-queue nomenclature
      similar to wait-queue head naming:
      
      	struct wait_bit_queue   => struct wait_bit_queue_entry
      	q			=> wbq_entry
      
      Which makes it all a much clearer:
      
      	struct wait_bit_queue_entry *wbq_entry
      
      ... and turns the former confusing piece of code into:
      
      	finish_wait(wq_head, &wbq_entry->wq_entry;
      
      which IMHO makes it apparently clear what we are doing,
      without having to analyze the context of the code: we are
      adding a wait-queue entry to a regular wait-queue head,
      which entry is embedded in a wait-bit-queue entry.
      
      I'm not a big fan of acronyms, but repeating wait_bit_queue_entry
      in field and local variable names is too long, so Hopefully it's
      clear enough that 'wq_' prefixes stand for wait-queues, while
      'wbq_' prefixes stand for wait-bit-queues.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      76c85ddc
    • I
      sched/wait: Standardize 'struct wait_bit_queue' wait-queue entry field name · 21417136
      Ingo Molnar 提交于
      Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly
      name it as a wait-queue entry.
      
      Propagate it to a couple of usage sites where the wait-bit-queue internals
      are exposed.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21417136
    • I
      sched/wait: Standardize internal naming of wait-queue heads · 9d9d676f
      Ingo Molnar 提交于
      The wait-queue head parameters and variables are named in a
      couple of ways, we have the following variants currently:
      
      	wait_queue_head_t *q
      	wait_queue_head_t *wq
      	wait_queue_head_t *head
      
      In particular the 'wq' naming is ambiguous in the sense whether it's
      a wait-queue head or entry name - as entries were often named 'wait'.
      
      ( Not to mention the confusion of any readers coming over from
        workqueue-land. )
      
      Standardize all this around a single, unambiguous parameter and
      variable name:
      
      	struct wait_queue_head *wq_head
      
      which is easy to grep for and also rhymes nicely with the wait-queue
      entry naming:
      
      	struct wait_queue_entry *wq_entry
      
      Also rename:
      
      	struct __wait_queue_head => struct wait_queue_head
      
      ... and use this struct type to migrate from typedefs usage to 'struct'
      usage, which is more in line with existing kernel practices.
      
      Don't touch any external users and preserve the main wait_queue_head_t
      typedef.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d9d676f
    • I
      sched/wait: Standardize internal naming of wait-queue entries · 50816c48
      Ingo Molnar 提交于
      So the various wait-queue entry variables in include/linux/wait.h
      and kernel/sched/wait.c are named in a colorfully inconsistent
      way:
      
      	wait_queue_entry_t *wait
      	wait_queue_entry_t *__wait	(even in plain C code!)
      	wait_queue_entry_t *q		(!)
      	wait_queue_entry_t *new		(making anyone who knows C++ cringe)
      	wait_queue_entry_t *old
      
      I think part of the reason for the inconsistency is the constant
      apparent confusion about what a wait queue 'head' versus 'entry' is.
      
      ( Some of the documentation talks about a 'wait descriptor', which is
        the wait-queue entry itself - further adding to the confusion. )
      
      The most common name is 'wait', but that in itself is somewhat
      ambiguous as well, as it does not really make it clear whether
      it's a wait-queue entry or head.
      
      To improve all this name the wait-queue entry structure parameters
      and variables consistently and push through this naming into all
      the wait.h and wait.c code:
      
      	struct wait_queue_entry *wq_entry
      
      The 'wq_' prefix makes it easy to grep for, and we also use the
      opportunity to move away from the typedef to a plain 'struct' naming:
      in the kernel we typically reserve typedefs for cases where a
      C structure is really small and somewhat opaque - such as pte_t.
      
      wait-queue entries are neither small nor opaque, so use the more
      standard 'struct xxx_entry' list management code nomenclature instead.
      
      ( We don't touch external users, and we preserve the typedef as well
        for actual wait-queue users, to reduce unnecessary churn. )
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      50816c48
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  5. 09 3月, 2017 1 次提交
    • L
      sched/headers: fix up header file dependency on <linux/sched/signal.h> · bd0f9b35
      Linus Torvalds 提交于
      The scheduler header file split and cleanups ended up exposing a few
      nasty header file dependencies, and in particular it showed how we in
      <linux/wait.h> ended up depending on "signal_pending()", which now comes
      from <linux/sched/signal.h>.
      
      That's a very subtle and annoying dependency, which already caused a
      semantic merge conflict (see commit e58bc927 "Pull overlayfs updates
      from Miklos Szeredi", which added that fixup in the merge commit).
      
      It turns out that we can avoid this dependency _and_ improve code
      generation by moving the guts of the fairly nasty helper #define
      __wait_event_interruptible_locked() to out-of-line code.  The code that
      includes the signal_pending() check is all in the slow-path where we
      actually go to sleep waiting for the event anyway, so using a helper
      function is the right thing to do.
      
      Using a helper function is also what we already did for the non-locked
      versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
      set of helper functions.
      
      We might want to try to unify all these macro games, we have a _lot_ of
      subtly different wait-event loops.  But this is the minimal patch to fix
      the annoying header dependency.
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd0f9b35
  6. 02 3月, 2017 1 次提交
  7. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  8. 30 9月, 2016 4 次提交
  9. 19 7月, 2016 1 次提交
  10. 24 2月, 2016 1 次提交
  11. 14 12月, 2015 1 次提交
  12. 01 12月, 2015 1 次提交
  13. 23 11月, 2015 1 次提交
  14. 23 9月, 2015 1 次提交
  15. 05 9月, 2015 1 次提交
  16. 17 6月, 2015 1 次提交
    • Y
      wait: introduce wait_event_exclusive_cmd · 9f3520c3
      Yuanhan Liu 提交于
      It's just a variant of wait_event_cmd(), with exclusive flag being set.
      
      For cases like RAID5, which puts many processes to sleep until 1/4
      resources are free, a wake_up wakes up all processes to run, but
      there is one process being able to get the resource as it's protected
      by a spin lock. That ends up introducing heavy lock contentions, and
      hurts performance badly.
      
      Here introduce wait_event_exclusive_cmd to relieve the lock contention
      naturally by letting wake_up just wake up one process.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      v2: its assumed that wait*() and __wait*() have the same arguments - peterz
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9f3520c3
  17. 08 5月, 2015 1 次提交
  18. 05 2月, 2015 1 次提交
    • P
      block: Simplify bsg complete all · 2c561246
      Peter Zijlstra 提交于
      It took me a few tries to figure out what this code did; lets rewrite
      it into a more regular form.
      
      The thing that makes this one 'special' is the BSG_F_BLOCK flag, if
      that is not set we're not supposed/allowed to block and should spin
      wait for completion.
      
      The (new) io_wait_event() will never see a false condition in case of
      the spinning and we will therefore not block.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2c561246
  19. 04 2月, 2015 1 次提交
  20. 03 2月, 2015 1 次提交
  21. 04 11月, 2014 1 次提交
  22. 28 10月, 2014 2 次提交
  23. 25 9月, 2014 1 次提交
    • N
      SCHED: add some "wait..on_bit...timeout()" interfaces. · cbbce822
      NeilBrown 提交于
      In commit c1221321
         sched: Allow wait_on_bit_action() functions to support a timeout
      
      I suggested that a "wait_on_bit_timeout()" interface would not meet my
      need.  This isn't true - I was just over-engineering.
      
      Including a 'private' field in wait_bit_key instead of a focused
      "timeout" field was just premature generalization.  If some other
      use is ever found, it can be generalized or added later.
      
      So this patch renames "private" to "timeout" with a meaning "stop
      waiting when "jiffies" reaches or passes "timeout",
      and adds two of the many possible wait..bit..timeout() interfaces:
      
      wait_on_page_bit_killable_timeout(), which is the one I want to use,
      and out_of_line_wait_on_bit_timeout() which is a reasonably general
      example.  Others can be added as needed.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      cbbce822
  24. 05 9月, 2014 1 次提交
  25. 16 7月, 2014 2 次提交
    • N
      sched: Allow wait_on_bit_action() functions to support a timeout · c1221321
      NeilBrown 提交于
      It is currently not possible for various wait_on_bit functions
      to implement a timeout.
      
      While the "action" function that is called to do the waiting
      could certainly use schedule_timeout(), there is no way to carry
      forward the remaining timeout after a false wake-up.
      As false-wakeups a clearly possible at least due to possible
      hash collisions in bit_waitqueue(), this is a real problem.
      
      The 'action' function is currently passed a pointer to the word
      containing the bit being waited on.  No current action functions
      use this pointer.  So changing it to something else will be a
      little noisy but will have no immediate effect.
      
      This patch changes the 'action' function to take a pointer to
      the "struct wait_bit_key", which contains a pointer to the word
      containing the bit so nothing is really lost.
      
      It also adds a 'private' field to "struct wait_bit_key", which
      is initialized to zero.
      
      An action function can now implement a timeout with something
      like
      
      static int timed_out_waiter(struct wait_bit_key *key)
      {
      	unsigned long waited;
      	if (key->private == 0) {
      		key->private = jiffies;
      		if (key->private == 0)
      			key->private -= 1;
      	}
      	waited = jiffies - key->private;
      	if (waited > 10 * HZ)
      		return -EAGAIN;
      	schedule_timeout(waited - 10 * HZ);
      	return 0;
      }
      
      If any other need for context in a waiter were found it would be
      easy to use ->private for some other purpose, or even extend
      "struct wait_bit_key".
      
      My particular need is to support timeouts in nfs_release_page()
      to avoid deadlocks with loopback mounted NFS.
      
      While wait_on_bit_timeout() would be a cleaner interface, it
      will not meet my need.  I need the timeout to be sensitive to
      the state of the connection with the server, which could change.
       So I need to use an 'action' interface.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1221321
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  26. 19 4月, 2014 1 次提交