1. 18 2月, 2014 3 次提交
  2. 25 1月, 2014 1 次提交
    • O
      introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty() · a8d4b834
      Oleg Nesterov 提交于
      rcu_dereference_check_fdtable() looks very wrong,
      
      1. rcu_my_thread_group_empty() was added by 844b9a87 "vfs: fix
         RCU-lockdep false positive due to /proc" but it doesn't really
         fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
         hit the same race with get_files_struct().
      
         And otoh rcu_my_thread_group_empty() can suppress the correct
         warning if the caller is the CLONE_FILES (without CLONE_THREAD)
         task.
      
      2. files->count == 1 check is not really right too. Even if this
         files_struct is not shared it is not safe to access it lockless
         unless the caller is the owner.
      
         Otoh, this check is sub-optimal. files->count == 0 always means
         it is safe to use it lockless even if files != current->files,
         but put_files_struct() has to take rcu_read_lock(). See the next
         patch.
      
      This patch removes the buggy checks and turns fcheck_files() into
      __fcheck_files() which uses rcu_dereference_raw(), the "unshared"
      callers, fget_light() and fget_raw_light(), can use it to avoid
      the warning from RCU-lockdep.
      
      fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
      plus __fcheck_files().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8d4b834
  3. 16 12月, 2013 1 次提交
  4. 13 12月, 2013 4 次提交
    • T
      rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h · bd73a7f5
      Teodora Baluta 提交于
      Function prototypes don't need to have the "extern" keyword since this
      is the default behavior. Its explicit use is redundant.  This commit
      therefore removes them.
      Signed-off-by: NTeodora Baluta <teobaluta@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bd73a7f5
    • C
      rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow · d1008950
      Chen Gang 提交于
      If the rcutorture SRCU output exceeds 4096 bytes, for example, if you
      have more than about 75 CPUs, it will overflow the current statically
      allocated buffer.  This commit therefore replaces this static buffer
      with a dynamically buffer whose size is based on the number of CPUs.
      
      Benefits:
      
       - Avoids both buffer overflow and output truncation.
       - Handles an arbitrarily large number of CPUs.
       - Straightforward implementation.
      
      Shortcomings:
      
       - Some memory is wasted:
      
         1 cpu now comsumes 50 - 60 bytes, and this patch provides 200 bytes.
         Therefore, for 1K CPUs, roughly 100KB of memory will be wasted.
         However, the memory is freed immediately after printing, so this
         wastage should not be a problem in practice.
      
      Testing (Fedora16 2 CPUs, 2GB RAM x86_64):
      
       - as module, with/without "torture_type=srcu".
       - build-in not boot runnable, with/without "torture_type=srcu".
       - build-in let boot runnable, with/without "torture_type=srcu".
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d1008950
    • P
      rcu: Don't activate RCU core on NO_HZ_FULL CPUs · a096932f
      Paul E. McKenney 提交于
      Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see
      if the RCU core needs anything from this CPU.  If so, RCU raises
      RCU_SOFTIRQ to carry out any needed processing.
      
      This approach has worked well historically, but it is undesirable on
      NO_HZ_FULL CPUs.  Such CPUs are expected to spend almost all of their time
      in userspace, so that scheduling-clock interrupts can be disabled while
      there is only one runnable task on the CPU in question.  Unfortunately,
      raising any softirq has the potential to wake up ksoftirqd, which would
      provide the second runnable task on that CPU, preventing disabling of
      scheduling-clock interrupts.
      
      What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone,
      relying on the grace-period kthreads' quiescent-state forcing to
      do any needed RCU work on behalf of those CPUs.
      
      This commit therefore refrains from raising RCU_SOFTIRQ on any
      NO_HZ_FULL CPUs during any grace periods that have been in effect
      for less than one second.  The one-second limit handles the case
      where an inappropriate workload is running on a NO_HZ_FULL CPU
      that features lots of scheduling-clock interrupts, but no idle
      or userspace time.
      Reported-by: NMike Galbraith <bitbucket@online.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMike Galbraith <bitbucket@online.de>
      Toasted-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a096932f
    • L
      rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq · 79a62f95
      Lai Jiangshan 提交于
      After commit #10f39bb1 (rcu: protect __rcu_read_unlock() against
      scheduler-using irq handlers), it is no longer possible to enter
      the main body of rcu_read_lock_special() from an NMI, interrupt, or
      softirq handler.  In theory, this implies that the check for "in_irq()
      || in_serving_softirq()" must always fail, so that in theory this check
      could be removed entirely.
      
      In practice, this commit wraps this condition with a WARN_ON_ONCE().
      If this warning never triggers, then the condition will be removed
      entirely.
      
      [ paulmck: And one way of triggering the WARN_ON() is if a scheduling
        clock interrupt occurs in an RCU read-side critical section, setting
        RCU_READ_UNLOCK_NEED_QS, which is handled by rcu_read_unlock_special().
        Updated this commit to return if only that bit was set. ]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      79a62f95
  5. 10 12月, 2013 4 次提交
    • P
      rcu: Provide better diagnostics for blocking in RCU callback functions · 24ef659a
      Paul E. McKenney 提交于
      Currently blocking in an RCU callback function will result in
      "scheduling while atomic", which could be triggered for any number
      of reasons.  To aid debugging, this patch introduces a rcu_callback_map
      that is used to tie the inappropriate voluntary context switch back
      to the fact that the function is being invoked from within a callback.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      24ef659a
    • P
      rcu: Improve SRCU's grace-period comments · bc72d962
      Paul E. McKenney 提交于
      This commit documents the memory-barrier guarantees provided by
      synchronize_srcu() and call_srcu().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bc72d962
    • P
      rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values · 04f34650
      Paul E. McKenney 提交于
      Each element of the rcu_state structure's ->levelspread[] array
      is intended to contain the per-level fanout, where the zero-th
      element corresponds to the root of the rcu_node tree, and the last
      element corresponds to the leaves.  In the CONFIG_RCU_FANOUT_EXACT
      case, this means that the last element should be filled in
      from CONFIG_RCU_FANOUT_LEAF (or from the rcu_fanout_leaf boot
      parameter, if provided) and that the remaining elements should
      be filled in from CONFIG_RCU_FANOUT.  Unfortunately, the current
      code in rcu_init_levelspread() takes the opposite approach, placing
      CONFIG_RCU_FANOUT_LEAF in the zero-th element and CONFIG_RCU_FANOUT in
      the remaining elements.
      
      For typical power-of-two values, this generates odd but functional
      rcu_node trees.  However, other values, for example CONFIG_RCU_FANOUT=3
      and CONFIG_RCU_FANOUT_LEAF=2, generate trees that can leave some CPUs
      out of the grace-period computation, resulting in too-short grace periods
      and therefore a broken RCU implementation.
      
      This commit therefore fixes rcu_init_levelspread() to set the last
      ->levelspread[] array element from CONFIG_RCU_FANOUT_LEAF and the
      remaining elements from CONFIG_RCU_FANOUT, thus generating the
      intended rcu_node trees.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      04f34650
    • F
      rcu: Fix coccinelle warnings · f6f7ee9a
      Fengguang Wu 提交于
      This commit fixes the following coccinelle warning:
      
      kernel/rcu/tree.c:712:9-10: WARNING: return of 0/1 in function
      'rcu_lockdep_current_cpu_online' with return type bool
      
      Return statements in functions returning bool should use
       true/false instead of 1/0.
       Generated by: coccinelle/misc/boolreturn.cocci
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f6f7ee9a
  6. 04 12月, 2013 6 次提交
    • P
      rcu: Let the world know when RCU adjusts its geometry · 39479098
      Paul E. McKenney 提交于
      Some RCU bugs have been specific to the layout of the rcu_node tree,
      but RCU will silently adjust the tree at boot time if appropriate.
      This obscures valuable debugging information, so print a message when
      this happens.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      39479098
    • P
      rcu: Fix srcu_barrier() docbook header · 4461212a
      Paul E. McKenney 提交于
      The srcu_barrier() docbook header left out the "sp" argument, so this
      commit adds that argument's docbook text.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4461212a
    • P
      rcu: Allow task-level idle entry/exit nesting · 3a592405
      Paul E. McKenney 提交于
      The current task-level idle entry/exit code forces an entry/exit on
      each call, regardless of the nesting level.  This commit therefore
      properly accounts for nesting.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3a592405
    • P
      rcu: Break call_rcu() deadlock involving scheduler and perf · 96d3fd0d
      Paul E. McKenney 提交于
      Dave Jones got the following lockdep splat:
      
      >  ======================================================
      >  [ INFO: possible circular locking dependency detected ]
      >  3.12.0-rc3+ #92 Not tainted
      >  -------------------------------------------------------
      >  trinity-child2/15191 is trying to acquire lock:
      >   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >
      > but task is already holding lock:
      >   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
      >
      > which lock already depends on the new lock.
      >
      >
      > the existing dependency chain (in reverse order) is:
      >
      > -> #3 (&ctx->lock){-.-...}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
      >         [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
      >         [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
      >         [<ffffffff81732052>] __schedule+0x1d2/0xa20
      >         [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
      >         [<ffffffff817352b6>] retint_kernel+0x26/0x30
      >         [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
      >         [<ffffffff813f0504>] pty_write+0x54/0x60
      >         [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
      >         [<ffffffff813e5838>] tty_write+0x158/0x2d0
      >         [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
      >         [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
      >         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      >
      > -> #2 (&rq->lock){-.-.-.}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
      >         [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
      >         [<ffffffff81054336>] do_fork+0x126/0x460
      >         [<ffffffff81054696>] kernel_thread+0x26/0x30
      >         [<ffffffff8171ff93>] rest_init+0x23/0x140
      >         [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
      >         [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
      >         [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
      >
      > -> #1 (&p->pi_lock){-.-.-.}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >         [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
      >         [<ffffffff81097d62>] default_wake_function+0x12/0x20
      >         [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
      >         [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
      >         [<ffffffff8108ff59>] __wake_up+0x39/0x50
      >         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >         [<ffffffff81111450>] __call_rcu+0x140/0x820
      >         [<ffffffff81111b8d>] call_rcu+0x1d/0x20
      >         [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
      >         [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
      >         [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
      >         [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
      >         [<ffffffff817200be>] kernel_init+0xe/0x190
      >         [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
      >
      > -> #0 (&rdp->nocb_wq){......}:
      >         [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >         [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >         [<ffffffff81111450>] __call_rcu+0x140/0x820
      >         [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
      >         [<ffffffff81149abf>] put_ctx+0x4f/0x70
      >         [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
      >         [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
      >         [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
      >         [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
      >         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      >
      > other info that might help us debug this:
      >
      > Chain exists of:
      >   &rdp->nocb_wq --> &rq->lock --> &ctx->lock
      >
      >   Possible unsafe locking scenario:
      >
      >         CPU0                    CPU1
      >         ----                    ----
      >    lock(&ctx->lock);
      >                                 lock(&rq->lock);
      >                                 lock(&ctx->lock);
      >    lock(&rdp->nocb_wq);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by trinity-child2/15191:
      >  #0:  (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
      >
      > stack backtrace:
      > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
      >  ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
      >  ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
      >  ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
      > Call Trace:
      >  [<ffffffff8172a363>] dump_stack+0x4e/0x82
      >  [<ffffffff81726741>] print_circular_bug+0x200/0x20f
      >  [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
      >  [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
      >  [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
      >  [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
      >  [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
      >  [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >  [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >  [<ffffffff81111450>] __call_rcu+0x140/0x820
      >  [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
      >  [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
      >  [<ffffffff81149abf>] put_ctx+0x4f/0x70
      >  [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
      >  [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
      >  [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
      >  [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
      >  [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
      >  [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
      >  [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      
      The underlying problem is that perf is invoking call_rcu() with the
      scheduler locks held, but in NOCB mode, call_rcu() will with high
      probability invoke the scheduler -- which just might want to use its
      locks.  The reason that call_rcu() needs to invoke the scheduler is
      to wake up the corresponding rcuo callback-offload kthread, which
      does the job of starting up a grace period and invoking the callbacks
      afterwards.
      
      One solution (championed on a related problem by Lai Jiangshan) is to
      simply defer the wakeup to some point where scheduler locks are no longer
      held.  Since we don't want to unnecessarily incur the cost of such
      deferral, the task before us is threefold:
      
      1.	Determine when it is likely that a relevant scheduler lock is held.
      
      2.	Defer the wakeup in such cases.
      
      3.	Ensure that all deferred wakeups eventually happen, preferably
      	sooner rather than later.
      
      We use irqs_disabled_flags() as a proxy for relevant scheduler locks
      being held.  This works because the relevant locks are always acquired
      with interrupts disabled.  We may defer more often than needed, but that
      is at least safe.
      
      The wakeup deferral is tracked via a new field in the per-CPU and
      per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
      
      This flag is checked by the RCU core processing.  The __rcu_pending()
      function now checks this flag, which causes rcu_check_callbacks()
      to initiate RCU core processing at each scheduling-clock interrupt
      where this flag is set.  Of course this is not sufficient because
      scheduling-clock interrupts are often turned off (the things we used to
      be able to count on!).  So the flags are also checked on entry to any
      state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
      state and NO_HZ_FULL user-mode-execution state.
      
      This approach should allow call_rcu() to be invoked regardless of what
      locks you might be holding, the key word being "should".
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      96d3fd0d
    • P
      rcu: Fix and comment ordering around wait_event() · 78e4bc34
      Paul E. McKenney 提交于
      It is all too easy to forget that wait_event() does not necessarily
      imply a full memory barrier.  The case where it does not is where the
      condition transitions to true just as wait_event() starts execution.
      This is actually a feature: The standard use of wait_event() involves
      locking, in which case the locks provide the needed ordering (you hold a
      lock across the wake_up() and acquire that same lock after wait_event()
      returns).
      
      Given that I did forget that wait_event() does not necessarily imply a
      full memory barrier in one case, this commit fixes that case.  This commit
      also adds comments calling out the placement of existing memory barriers
      relied on by wait_event() calls.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      78e4bc34
    • P
      rcu: Kick CPU halfway to RCU CPU stall warning · 6193c76a
      Paul E. McKenney 提交于
      When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on
      itself.  This can help move the grace period forward in some situations,
      but it would be even better to do this -before- the RCU CPU stall warning.
      This commit therefore causes resched_cpu() to be called every five jiffies
      once the system is halfway to an RCU CPU stall warning.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6193c76a
  7. 19 11月, 2013 1 次提交
  8. 06 11月, 2013 1 次提交
  9. 16 10月, 2013 1 次提交