1. 30 6月, 2017 5 次提交
  2. 29 6月, 2017 3 次提交
  3. 24 6月, 2017 4 次提交
  4. 23 6月, 2017 3 次提交
  5. 22 6月, 2017 3 次提交
  6. 21 6月, 2017 6 次提交
  7. 20 6月, 2017 16 次提交
    • D
      sched/core: Drop the unused try_get_task_struct() helper function · f11cc076
      Davidlohr Bueso 提交于
      This function was introduced by:
      
        150593bf ("sched/api: Introduce task_rcu_dereference() and try_get_task_struct()")
      
      ... to allow easier usage of task_rcu_dereference(), however no users
      were ever added. Drop the helper.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/20170615023730.22827-1-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f11cc076
    • D
      sched/fair: WARN() and refuse to set buddy when !se->on_rq · c5ae366e
      Daniel Axtens 提交于
      If we set a next or last buddy for a se that is not on_rq, we will
      end up taking a NULL pointer dereference in wakeup_preempt_entity
      via pick_next_task_fair.
      
      Detect when we would be about to do that, throw a warning and
      then refuse to actually set it.
      
      This has been suggested at least twice:
      
        https://marc.info/?l=linux-kernel&m=146651668921468&w=2
        https://lkml.org/lkml/2016/6/16/663
      
      I recently had to debug a problem with these (we hadn't backported
      Konstantin's patches in this area) and this would have saved a lot
      of time/pain.
      
      Just do it.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170510201139.16236-1-dja@axtens.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c5ae366e
    • I
      sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well · 6d3aed3d
      Ingo Molnar 提交于
      This definition of SCHED_WARN_ON():
      
       #define SCHED_WARN_ON(x)        ((void)(x))
      
      is not fully compatible with the 'real' WARN_ON_ONCE() primitive, as it
      has no return value, so it cannot be used in conditionals.
      
      Fix it.
      
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6d3aed3d
    • I
      sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming · 2055da97
      Ingo Molnar 提交于
      So I've noticed a number of instances where it was not obvious from the
      code whether ->task_list was for a wait-queue head or a wait-queue entry.
      
      Furthermore, there's a number of wait-queue users where the lists are
      not for 'tasks' but other entities (poll tables, etc.), in which case
      the 'task_list' name is actively confusing.
      
      To clear this all up, name the wait-queue head and entry list structure
      fields unambiguously:
      
      	struct wait_queue_head::task_list	=> ::head
      	struct wait_queue_entry::task_list	=> ::entry
      
      For example, this code:
      
      	rqw->wait.task_list.next != &wait->task_list
      
      ... is was pretty unclear (to me) what it's doing, while now it's written this way:
      
      	rqw->wait.head.next != &wait->entry
      
      ... which makes it pretty clear that we are iterating a list until we see the head.
      
      Other examples are:
      
      	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
      	list_for_each_entry(wq, &fence->wait.task_list, task_list) {
      
      ... where it's unclear (to me) what we are iterating, and during review it's
      hard to tell whether it's trying to walk a wait-queue entry (which would be
      a bug), while now it's written as:
      
      	list_for_each_entry_safe(pos, next, &x->head, entry) {
      	list_for_each_entry(wq, &fence->wait.head, entry) {
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2055da97
    • I
      sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c · 5822a454
      Ingo Molnar 提交于
      The key hashed waitqueue data structures and their initialization
      was done in the main scheduler file for no good reason, move them
      to sched/wait_bit.c instead.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5822a454
    • I
      sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> · 5dd43ce2
      Ingo Molnar 提交于
      The wait_bit*() types and APIs are mixed into wait.h, but they
      are a pretty orthogonal extension of wait-queues.
      
      Furthermore, only about 50 kernel files use these APIs, while
      over 1000 use the regular wait-queue functionality.
      
      So clean up the main wait.h by moving the wait-bit functionality
      out of it, into a separate .h and .c file:
      
        include/linux/wait_bit.h  for types and APIs
        kernel/sched/wait_bit.c   for the implementation
      
      Update all header dependencies.
      
      This reduces the size of wait.h rather significantly, by about 30%.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5dd43ce2
    • I
      sched/wait: Standardize wait_bit_queue naming · 76c85ddc
      Ingo Molnar 提交于
      So wait-bit-queue head variables are often named:
      
      	struct wait_bit_queue *q
      
      ... which is a bit ambiguous and super confusing, because
      they clearly suggest wait-queue head semantics and behavior
      (they rhyme with the old wait_queue_t *q naming), while they
      are extended wait-queue _entries_, not heads!
      
      They are misnomers in two ways:
      
       - the 'wait_bit_queue' leaves open the question of whether
         it's an entry or a head
      
       - the 'q' parameter and local variable naming falsely implies
         that it's a 'queue' - while it's an entry.
      
      This resulted in sometimes confusing cases such as:
      
      	finish_wait(wq, &q->wait);
      
      where the 'q' is not a wait-queue head, but a wait-bit-queue entry.
      
      So improve this all by standardizing wait-bit-queue nomenclature
      similar to wait-queue head naming:
      
      	struct wait_bit_queue   => struct wait_bit_queue_entry
      	q			=> wbq_entry
      
      Which makes it all a much clearer:
      
      	struct wait_bit_queue_entry *wbq_entry
      
      ... and turns the former confusing piece of code into:
      
      	finish_wait(wq_head, &wbq_entry->wq_entry;
      
      which IMHO makes it apparently clear what we are doing,
      without having to analyze the context of the code: we are
      adding a wait-queue entry to a regular wait-queue head,
      which entry is embedded in a wait-bit-queue entry.
      
      I'm not a big fan of acronyms, but repeating wait_bit_queue_entry
      in field and local variable names is too long, so Hopefully it's
      clear enough that 'wq_' prefixes stand for wait-queues, while
      'wbq_' prefixes stand for wait-bit-queues.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      76c85ddc
    • I
      sched/wait: Standardize 'struct wait_bit_queue' wait-queue entry field name · 21417136
      Ingo Molnar 提交于
      Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly
      name it as a wait-queue entry.
      
      Propagate it to a couple of usage sites where the wait-bit-queue internals
      are exposed.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21417136
    • I
      sched/wait: Standardize internal naming of wait-queue heads · 9d9d676f
      Ingo Molnar 提交于
      The wait-queue head parameters and variables are named in a
      couple of ways, we have the following variants currently:
      
      	wait_queue_head_t *q
      	wait_queue_head_t *wq
      	wait_queue_head_t *head
      
      In particular the 'wq' naming is ambiguous in the sense whether it's
      a wait-queue head or entry name - as entries were often named 'wait'.
      
      ( Not to mention the confusion of any readers coming over from
        workqueue-land. )
      
      Standardize all this around a single, unambiguous parameter and
      variable name:
      
      	struct wait_queue_head *wq_head
      
      which is easy to grep for and also rhymes nicely with the wait-queue
      entry naming:
      
      	struct wait_queue_entry *wq_entry
      
      Also rename:
      
      	struct __wait_queue_head => struct wait_queue_head
      
      ... and use this struct type to migrate from typedefs usage to 'struct'
      usage, which is more in line with existing kernel practices.
      
      Don't touch any external users and preserve the main wait_queue_head_t
      typedef.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d9d676f
    • I
      sched/wait: Standardize internal naming of wait-queue entries · 50816c48
      Ingo Molnar 提交于
      So the various wait-queue entry variables in include/linux/wait.h
      and kernel/sched/wait.c are named in a colorfully inconsistent
      way:
      
      	wait_queue_entry_t *wait
      	wait_queue_entry_t *__wait	(even in plain C code!)
      	wait_queue_entry_t *q		(!)
      	wait_queue_entry_t *new		(making anyone who knows C++ cringe)
      	wait_queue_entry_t *old
      
      I think part of the reason for the inconsistency is the constant
      apparent confusion about what a wait queue 'head' versus 'entry' is.
      
      ( Some of the documentation talks about a 'wait descriptor', which is
        the wait-queue entry itself - further adding to the confusion. )
      
      The most common name is 'wait', but that in itself is somewhat
      ambiguous as well, as it does not really make it clear whether
      it's a wait-queue entry or head.
      
      To improve all this name the wait-queue entry structure parameters
      and variables consistently and push through this naming into all
      the wait.h and wait.c code:
      
      	struct wait_queue_entry *wq_entry
      
      The 'wq_' prefix makes it easy to grep for, and we also use the
      opportunity to move away from the typedef to a plain 'struct' naming:
      in the kernel we typically reserve typedefs for cases where a
      C structure is really small and somewhat opaque - such as pte_t.
      
      wait-queue entries are neither small nor opaque, so use the more
      standard 'struct xxx_entry' list management code nomenclature instead.
      
      ( We don't touch external users, and we preserve the typedef as well
        for actual wait-queue users, to reduce unnecessary churn. )
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      50816c48
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
    • L
      locking/rtmutex: Don't initialize lockdep when not required · cde50a67
      Levin, Alexander (Sasha Levin) 提交于
      pi_mutex isn't supposed to be tracked by lockdep, but just
      passing NULLs for name and key will cause lockdep to spew a
      warning and die, which is not what we want it to do.
      
      Skip lockdep initialization if the caller passed NULLs for
      name and key, suggesting such initialization isn't desired.
      Signed-off-by: NSasha Levin <alexander.levin@verizon.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f5694788 ("rt_mutex: Add lockdep annotations")
      Link: http://lkml.kernel.org/r/20170618140548.4763-1-alexander.levin@verizon.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cde50a67
    • P
      livepatch: Fix stacking of patches with respect to RCU · 842c0884
      Petr Mladek 提交于
      rcu_read_(un)lock(), list_*_rcu(), and synchronize_rcu() are used for a secure
      access and manipulation of the list of patches that modify the same function.
      In particular, it is the variable func_stack that is accessible from the ftrace
      handler via struct ftrace_ops and klp_ops.
      
      Of course, it synchronizes also some states of the patch on the top of the
      stack, e.g. func->transition in klp_ftrace_handler.
      
      At the same time, this mechanism guards also the manipulation of
      task->patch_state. It is modified according to the state of the transition and
      the state of the process.
      
      Now, all this works well as long as RCU works well. Sadly livepatching might
      get into some corner cases when this is not true. For example, RCU is not
      watching when rcu_read_lock() is taken in idle threads.  It is because they
      might sleep and prevent reaching the grace period for too long.
      
      There are ways how to make RCU watching even in idle threads, see
      rcu_irq_enter(). But there is a small location inside RCU infrastructure when
      even this does not work.
      
      This small problematic location can be detected either before calling
      rcu_irq_enter() by rcu_irq_enter_disabled() or later by rcu_is_watching().
      Sadly, there is no safe way how to handle it.  Once we detect that RCU was not
      watching, we might see inconsistent state of the function stack and the related
      variables in klp_ftrace_handler(). Then we could do a wrong decision, use an
      incompatible implementation of the function and break the consistency of the
      system. We could warn but we could not avoid the damage.
      
      Fortunately, ftrace has similar problems and they seem to be solved well there.
      It uses a heavy weight implementation of some RCU operations. In particular, it
      replaces:
      
        + rcu_read_lock() with preempt_disable_notrace()
        + rcu_read_unlock() with preempt_enable_notrace()
        + synchronize_rcu() with schedule_on_each_cpu(sync_work)
      
      My understanding is that this is RCU implementation from a stone age. It meets
      the core RCU requirements but it is rather ineffective. Especially, it does not
      allow to batch or speed up the synchronize calls.
      
      On the other hand, it is very trivial. It allows to safely trace and/or
      livepatch even the RCU core infrastructure.  And the effectiveness is a not a
      big issue because using ftrace or livepatches on productive systems is a rare
      operation.  The safety is much more important than a negligible extra load.
      
      Note that the alternative implementation follows the RCU principles. Therefore,
           we could and actually must use list_*_rcu() variants when manipulating the
           func_stack.  These functions allow to access the pointers in the right
           order and with the right barriers. But they do not use any other
           information that would be set only by rcu_read_lock().
      
      Also note that there are actually two problems solved in ftrace:
      
      First, it cares about the consistency of RCU read sections.  It is being solved
      the way as described and used in this patch.
      
      Second, ftrace needs to make sure that nobody is inside the dynamic trampoline
      when it is being freed. For this, it also calls synchronize_rcu_tasks() in
      preemptive kernel in ftrace_shutdown().
      
      Livepatch has similar problem but it is solved by ftrace for free.
      klp_ftrace_handler() is a good guy and never sleeps. In addition, it is
      registered with FTRACE_OPS_FL_DYNAMIC. It causes that
      unregister_ftrace_function() calls:
      
      	* schedule_on_each_cpu(ftrace_sync) - always
      	* synchronize_rcu_tasks() - in preemptive kernel
      
      The effect is that nobody is neither inside the dynamic trampoline nor inside
      the ftrace handler after unregister_ftrace_function() returns.
      
      [jkosina@suse.cz: reformat changelog, fix comment]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      842c0884
    • J
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 3d88d56c
      John Stultz 提交于
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d88d56c
    • J
      time: Fix clock->read(clock) race around clocksource changes · ceea5e37
      John Stultz 提交于
      In tests, which excercise switching of clocksources, a NULL
      pointer dereference can be observed on AMR64 platforms in the
      clocksource read() function:
      
      u64 clocksource_mmio_readl_down(struct clocksource *c)
      {
      	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
      }
      
      This is called from the core timekeeping code via:
      
      	cycle_now = tkr->read(tkr->clock);
      
      tkr->read is the cached tkr->clock->read() function pointer.
      When the clocksource is changed then tkr->clock and tkr->read
      are updated sequentially. The code above results in a sequential
      load operation of tkr->read and tkr->clock as well.
      
      If the store to tkr->clock hits between the loads of tkr->read
      and tkr->clock, then the old read() function is called with the
      new clock pointer. As a consequence the read() function
      dereferences a different data structure and the resulting 'reg'
      pointer can point anywhere including NULL.
      
      This problem was introduced when the timekeeping code was
      switched over to use struct tk_read_base. Before that, it was
      theoretically possible as well when the compiler decided to
      reload clock in the code sequence:
      
           now = tk->clock->read(tk->clock);
      
      Add a helper function which avoids the issue by reading
      tk_read_base->clock once into a local variable clk and then issue
      the read function via clk->read(clk). This guarantees that the
      read() function always gets the proper clocksource pointer handed
      in.
      
      Since there is now no use for the tkr.read pointer, this patch
      also removes it, and to address stopping the fast timekeeper
      during suspend/resume, it introduces a dummy clocksource to use
      rather then just a dummy read function.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ceea5e37
    • A
      m68k: Remove ptrace_signal_deliver · 204a2be3
      Andreas Schwab 提交于
      This fixes debugger syscall restart interactions.  A debugger that
      modifies the tracee's program counter is expected to set the orig_d0
      pseudo register to -1, to disable a possible syscall restart.
      
      This removes the last user of the ptrace_signal_deliver hook in the ptrace
      signal handling, so remove that as well.
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      204a2be3