1. 12 2月, 2016 1 次提交
    • A
      kernel/locking/lockdep.c: convert hash tables to hlists · 4a389810
      Andrew Morton 提交于
      Mike said:
      
      : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.  e
      : kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
      : message.
      :
      : The problem is that ubsan callbacks use spinlocks and might be called
      : before lockdep is initialized.  Particularly this line in the
      : reserve_ebda_region function causes problem:
      :
      : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
      :
      : If i put lockdep_init() before reserve_ebda_region call in
      : x86_64_start_reservations kernel loads well.
      
      Fix this ordering issue permanently: change lockdep so that it uses
      hlists for the hash tables.  Unlike a list_head, an hlist_head is in its
      initialized state when it is all-zeroes, so lockdep is ready for
      operation immediately upon boot - lockdep_init() need not have run.
      
      The patch will also save some memory.
      
      lockdep_init() and lockdep_initialized can be done away with now - a 4.6
      patch has been prepared to do this.
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Suggested-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a389810
  2. 23 11月, 2015 1 次提交
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
  3. 19 6月, 2015 2 次提交
    • G
      locking/lockdep: Remove hard coded array size dependency · 68722101
      George Beshers 提交于
      An apparent oversight left a hardcoded '4' in place when
      LOCKSTAT_POINTS was introduced.
      
      The contention_point[] and contending_point[] arrays in the
      structs lock_class and lock_class_stats need to be the same
      size for the loops in lock_stats() to be correct.
      
      This patch allows LOCKSTAT_POINTS to be changed without
      affecting the correctness of the code.
      Signed-off-by: NGeorge Beshers <gbeshers@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      68722101
    • P
      lockdep: Implement lock pinning · a24fc60d
      Peter Zijlstra 提交于
      Add a lockdep annotation that WARNs if you 'accidentially' unlock a
      lock.
      
      This is especially helpful for code with callbacks, where the upper
      layer assumes a lock remains taken but a lower layer thinks it maybe
      can drop and reacquire the lock.
      
      By unwittingly breaking up the lock, races can be introduced.
      
      Lock pinning is a lockdep annotation that helps with this, when you
      lockdep_pin_lock() a held lock, any unlock without a
      lockdep_unpin_lock() will produce a WARN. Think of this as a relative
      of lockdep_assert_held(), except you don't only assert its held now,
      but ensure it stays held until you release your assertion.
      
      RFC: a possible alternative API would be something like:
      
        int cookie = lockdep_pin_lock(&foo);
        ...
        lockdep_unpin_lock(&foo, cookie);
      
      Where we pick a random number for the pin_count; this makes it
      impossible to sneak a lock break in without also passing the right
      cookie along.
      
      I've not done this because it ends up generating code for !LOCKDEP,
      esp. if you need to pass the cookie around for some reason.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.906731065@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a24fc60d
  4. 04 3月, 2015 1 次提交
    • P
      rcu: Improve diagnostics for blocked critical sections in irq · d24209bb
      Paul E. McKenney 提交于
      If an RCU read-side critical section occurs within an interrupt handler
      or a softirq handler, it cannot have been preempted.  Therefore, there is
      a check in rcu_read_unlock_special() checking for this error.  However,
      when this check triggers, it lacks diagnostic information.  This commit
      therefore moves rcu_read_unlock()'s lockdep annotation to follow the
      call to __rcu_read_unlock() and changes rcu_read_unlock_special()'s
      WARN_ON_ONCE() to an lockdep_rcu_suspicious() in order to locate where
      the offending RCU read-side critical section began.  In addition, the
      value of the ->rcu_read_unlock_special field is printed.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d24209bb
  5. 03 10月, 2014 1 次提交
    • P
      locking/lockdep: Revert qrwlock recusive stuff · 8acd91e8
      Peter Zijlstra 提交于
      Commit f0bab73c ("locking/lockdep: Restrict the use of recursive
      read_lock() with qrwlock") changed lockdep to try and conform to the
      qrwlock semantics which differ from the traditional rwlock semantics.
      
      In particular qrwlock is fair outside of interrupt context, but in
      interrupt context readers will ignore all fairness.
      
      The problem modeling this is that read and write side have different
      lock state (interrupts) semantics but we only have a single
      representation of these. Therefore lockdep will get confused, thinking
      the lock can cause interrupt lock inversions.
      
      So revert it for now; the old rwlock semantics were already imperfectly
      modeled and the qrwlock extra won't fit either.
      
      If we want to properly fix this, I think we need to resurrect the work
      by Gautham did a few years ago that split the read and write state of
      locks:
      
         http://lwn.net/Articles/332801/
      
      FWIW the locking selftest that would've failed (and was reported by
      Borislav earlier) is something like:
      
        RL(X1);	/* IRQ-ON */
        LOCK(A);
        UNLOCK(A);
        RU(X1);
      
        IRQ_ENTER();
        RL(X1);	/* IN-IRQ */
        RU(X1);
        IRQ_EXIT();
      
      At which point it would report that because A is an IRQ-unsafe lock we
      can suffer the following inversion:
      
      	CPU0		CPU1
      
      	lock(A)
      			lock(X1)
      			lock(A)
      	<IRQ>
      	 lock(X1)
      
      And this is 'wrong' because X1 can recurse (assuming the above lock are
      in fact read-lock) but lockdep doesn't know about this.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: ego@linux.vnet.ibm.com
      Cc: bp@alien8.de
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8acd91e8
  6. 24 9月, 2014 1 次提交
  7. 19 9月, 2014 1 次提交
    • P
      rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42
      Paul E. McKenney 提交于
      Currently, the expedited grace-period primitives do get_online_cpus().
      This greatly simplifies their implementation, but means that calls
      to them holding locks that are acquired by CPU-hotplug notifiers (to
      say nothing of calls to these primitives from CPU-hotplug notifiers)
      can deadlock.  But this is starting to become inconvenient, as can be
      seen here: https://lkml.org/lkml/2014/8/5/754.  The problem in this
      case is that some developers need to acquire a mutex from a CPU-hotplug
      notifier, but also need to hold it across a synchronize_rcu_expedited().
      As noted above, this currently results in deadlock.
      
      This commit avoids the deadlock and retains the simplicity by creating
      a try_get_online_cpus(), which returns false if the get_online_cpus()
      reference count could not immediately be incremented.  If a call to
      try_get_online_cpus() returns true, the expedited primitives operate as
      before.  If a call returns false, the expedited primitives fall back to
      normal grace-period operations.  This falling back of course results in
      increased grace-period latency, but only during times when CPU hotplug
      operations are actually in flight.  The effect should therefore be
      negligible during normal operation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Tested-by: NLan Tianyu <tianyu.lan@intel.com>
      dd56af42
  8. 13 8月, 2014 2 次提交
    • W
      locking/lockdep: Restrict the use of recursive read_lock() with qrwlock · f0bab73c
      Waiman Long 提交于
      Unlike the original unfair rwlock implementation, queued rwlock
      will grant lock according to the chronological sequence of the lock
      requests except when the lock requester is in the interrupt context.
      Consequently, recursive read_lock calls will now hang the process if
      there is a write_lock call somewhere in between the read_lock calls.
      
      This patch updates the lockdep implementation to look for recursive
      read_lock calls. A new read state (3) is used to mark those read_lock
      call that cannot be recursively called except in the interrupt
      context. The new read state does exhaust the 2 bits available in
      held_lock:read bit field. The addition of any new read state in the
      future may require a redesign of how all those bits are squeezed
      together in the held_lock structure.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1407345722-61615-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0bab73c
    • D
      locking/Documentation: Move locking related docs into Documentation/locking/ · 214e0aed
      Davidlohr Bueso 提交于
      Specifically:
        Documentation/locking/lockdep-design.txt
        Documentation/locking/lockstat.txt
        Documentation/locking/mutex-design.txt
        Documentation/locking/rt-mutex-design.txt
        Documentation/locking/rt-mutex.txt
        Documentation/locking/spinlocks.txt
        Documentation/locking/ww-mutex-design.txt
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: jason.low2@hp.com
      Cc: aswin@hp.com
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lubomir Rintel <lkundrak@v3.sk>
      Cc: Masanari Iida <standby24x7@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: fengguang.wu@intel.com
      Link: http://lkml.kernel.org/r/1406752916-3341-6-git-send-email-davidlohr@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      214e0aed
  9. 14 2月, 2014 1 次提交
  10. 10 2月, 2014 2 次提交
    • O
      lockdep: Change lockdep_set_novalidate_class() to use _and_name · 47be1c1a
      Oleg Nesterov 提交于
      Cosmetic. This doesn't really matter because a) device->mutex is
      the only user of __lockdep_no_validate__ and b) this class should
      be never reported as the source of problem, but if something goes
      wrong "&dev->mutex" looks better than "&__lockdep_no_validate__"
      as the name of the lock.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140120182016.GA26512@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47be1c1a
    • O
      lockdep: Make held_lock->check and "int check" argument bool · fb9edbe9
      Oleg Nesterov 提交于
      The "int check" argument of lock_acquire() and held_lock->check are
      misleading. This is actually a boolean: 2 means "true", everything
      else is "false".
      
      And there is no need to pass 1 or 0 to lock_acquire() depending on
      CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the
      start and clears "check" if !CONFIG_PROVE_LOCKING.
      
      Note: probably we can simply kill this member/arg. The only explicit
      user of check => 0 is rcu_lock_acquire(), perhaps we can change it to
      use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means
      check => 0 implicitly, but we can change validate_chain() to check
      hlock->instance->key instead. Not to mention it would be nice to get
      rid of lockdep_set_novalidate_class().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb9edbe9
  11. 06 11月, 2013 1 次提交
  12. 12 7月, 2013 1 次提交
  13. 22 2月, 2013 1 次提交
  14. 19 2月, 2013 1 次提交
    • P
      lockdep: Silence warning if CONFIG_LOCKDEP isn't set · 5cd3f5af
      Paul Bolle 提交于
      Since commit c9a49628 ("nfsd:
      make client_lock per net") compiling nfs4state.o without
      CONFIG_LOCKDEP set, triggers this GCC warning:
      
          fs/nfsd/nfs4state.c: In function ‘free_client’:
          fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]
      
      The cause of that warning is that lockdep_assert_held() compiles
      away if CONFIG_LOCKDEP is not set. Silence this warning by using
      the argument to lockdep_assert_held() as a nop if CONFIG_LOCKDEP
      is not set.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Link: http://lkml.kernel.org/r/1359060797.1325.33.camel@x61.thuisdomeinSigned-off-by: NIngo Molnar <mingo@kernel.org>
      --
       include/linux/lockdep.h |    2 +-
       1 file changed, 1 insertion(+), 1 deletion(-)
      5cd3f5af
  15. 12 1月, 2013 1 次提交
  16. 15 5月, 2012 1 次提交
    • P
      lockdep: fix oops in processing workqueue · 4d82a1de
      Peter Zijlstra 提交于
      Under memory load, on x86_64, with lockdep enabled, the workqueue's
      process_one_work() has been seen to oops in __lock_acquire(), barfing
      on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].
      
      Because it's permissible to free a work_struct from its callout function,
      the map used is an onstack copy of the map given in the work_struct: and
      that copy is made without any locking.
      
      Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
      "rep movsq" for that structure copy: which might race with a workqueue
      user's wait_on_work() doing lock_map_acquire() on the source of the
      copy, putting a pointer into the class_cache[], but only in time for
      the top half of that pointer to be copied to the destination map.
      
      Boom when process_one_work() subsequently does lock_map_acquire()
      on its onstack copy of the lockdep_map.
      
      Fix this, and a similar instance in call_timer_fn(), with a
      lockdep_copy_map() function which additionally NULLs the class_cache[].
      
      Note: this oops was actually seen on 3.4-next, where flush_work() newly
      does the racing lock_map_acquire(); but Tejun points out that 3.4 and
      earlier are already vulnerable to the same through wait_on_work().
      
      * Patch orginally from Peter.  Hugh modified it a bit and wrote the
        description.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      LKML-Reference: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4d82a1de
  17. 14 11月, 2011 1 次提交
  18. 29 9月, 2011 1 次提交
    • P
      rcu: Restore checks for blocking in RCU read-side critical sections · b3fbab05
      Paul E. McKenney 提交于
      Long ago, using TREE_RCU with PREEMPT would result in "scheduling
      while atomic" diagnostics if you blocked in an RCU read-side critical
      section.  However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
      this diagnostic.  This commit therefore adds a replacement diagnostic
      based on PROVE_RCU.
      
      Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
      used for things that have nothing to do with rcu_dereference(), rename
      lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
      argument that is a string indicating what is suspicious.  This third
      argument is passed in from a new third argument to rcu_lockdep_assert().
      Update all calls to rcu_lockdep_assert() to add an informative third
      argument.
      
      Also, add a pair of rcu_lockdep_assert() calls from within
      rcu_note_context_switch(), one complaining if a context switch occurs
      in an RCU-bh read-side critical section and another complaining if a
      context switch occurs in an RCU-sched read-side critical section.
      These are present only if the PROVE_RCU kernel parameter is enabled.
      
      Finally, fix some checkpatch whitespace complaints in lockdep.c.
      
      Again, you must enable PROVE_RCU to see these new diagnostics.  But you
      are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b3fbab05
  19. 25 5月, 2011 1 次提交
    • P
      lockdep, mutex: provide mutex_lock_nest_lock · e4c70a66
      Peter Zijlstra 提交于
      In order to convert i_mmap_lock to a mutex we need a mutex equivalent to
      spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation.
      
      As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of
      the locking pattern where an outer lock serializes the acquisition order
      of nested locks.  That is, if every time you lock multiple locks A, say A1
      and A2 you first acquire N, the order of acquiring A1 and A2 is
      irrelevant.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4c70a66
  20. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6
  21. 11 1月, 2011 1 次提交
    • T
      workqueue: relax lockdep annotation on flush_work() · e159489b
      Tejun Heo 提交于
      Currently, the lockdep annotation in flush_work() requires exclusive
      access on the workqueue the target work is queued on and triggers
      warning if a work is trying to flush another work on the same
      workqueue; however, this is no longer true as workqueues can now
      execute multiple works concurrently.
      
      This patch adds lock_map_acquire_read() and make process_one_work()
      hold read access to the workqueue while executing a work and
      start_flush_work() check for write access if concurrnecy level is one
      or the workqueue has a rescuer (as only one execution resource - the
      rescuer - is guaranteed to be available under memory pressure), and
      read access if higher.
      
      This better represents what's going on and removes spurious lockdep
      warnings which are triggered by fake dependency chain created through
      flush_work().
      
      * Peter pointed out that flushing another work from a WQ_MEM_RECLAIM
        wq breaks forward progress guarantee under memory pressure.
        Condition check accordingly updated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Tested-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@kernel.org
      e159489b
  22. 19 10月, 2010 1 次提交
    • H
      lockdep: Add improved subclass caching · 62016250
      Hitoshi Mitake 提交于
      Current lockdep_map only caches one class with subclass == 0,
      and looks up hash table of classes when subclass != 0.
      
      It seems that this has no problem because the case of
      subclass != 0 is rare. But locks of struct rq are
      acquired with subclass == 1 when task migration is executed.
      Task migration is high frequent event, so I modified lockdep
      to cache subclasses.
      
      I measured the score of perf bench sched messaging.
      This patch has slightly but certain (order of milli seconds
      or 10 milli seconds) effect when lots of tasks are running.
      I'll show the result in the tail of this description.
      
      NR_LOCKDEP_CACHING_CLASSES specifies how many classes can be
      cached in the instances of lockdep_map.
      I discussed with Peter Zijlstra in LinuxCon Japan about
      this approach and he taught me that caching every subclasses(8)
      is cleary waste of memory. So number of cached classes
      should be configurable.
      
      === Score comparison of benchmarks ===
      # "min" means best score, and "max" means worst score
      
      for i in `seq 1 10`; do ./perf bench -f simple sched messaging; done
      
      before: min: 0.565000, max: 0.583000, avg: 0.572500
      after:  min: 0.559000, max: 0.568000, avg: 0.563300
      
      # with more processes
      for i in `seq 1 10`; do ./perf bench -f simple sched messaging -g 40; done
      
      before: min: 2.274000, max: 2.298000, avg: 2.286300
      after:  min: 2.242000, max: 2.270000, avg: 2.259700
      Signed-off-by: NHitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286269311-28336-2-git-send-email-mitake@dcl.info.waseda.ac.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62016250
  23. 12 10月, 2010 1 次提交
  24. 22 5月, 2010 1 次提交
  25. 13 3月, 2010 1 次提交
  26. 25 2月, 2010 1 次提交
  27. 02 8月, 2009 3 次提交
    • M
      lockdep: Reintroduce generation count to make BFS faster · e351b660
      Ming Lei 提交于
      We still can apply DaveM's generation count optimization to
      BFS, based on the following idea:
      
       - before doing each BFS, increase the global generation id
         by 1
      
       - if one node in the graph has been visited, mark it as
         visited by storing the current global generation id into
         the node's dep_gen_id field
      
       - so we can decide if one node has been visited already, by
         comparing the node's dep_gen_id with the global generation id.
      
      By applying DaveM's generation count optimization to current
      implementation of BFS, we gain the following advantages:
      
       - we save MAX_LOCKDEP_ENTRIES/8 bytes memory;
      
       - we remove the bitmap_zero(bfs_accessed, MAX_LOCKDEP_ENTRIES);
         in each BFS, which is very time-consuming since
         MAX_LOCKDEP_ENTRIES may be very large.(16384UL)
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "David S. Miller" <davem@davemloft.net>
      LKML-Reference: <1248274089-6358-1-git-send-email-tom.leiming@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e351b660
    • P
      lockdep: Deal with many similar locks · bb97a91e
      Peter Zijlstra 提交于
      spin_lock_nest_lock() allows to take many instances of the same
      class, this can easily lead to overflow of MAX_LOCK_DEPTH.
      
      To avoid this overflow, we'll stop accounting instances but
      start reference counting the class in the held_lock structure.
      
      [ We could maintain a list of instances, if we'd move the hlock
        stuff into __lock_acquired(), but that would require
        significant modifications to the current code. ]
      
      We restrict this mode to spin_lock_nest_lock() only, because it
      degrades the lockdep quality due to lost of instance.
      
      For lockstat this means we don't track lock statistics for any
      but the first lock in the series.
      
      Currently nesting is limited to 11 bits because that was the
      spare space available in held_lock. This yields a 2048
      instances maximium.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bb97a91e
    • P
      lockdep: Introduce lockdep_assert_held() · f607c668
      Peter Zijlstra 提交于
      Add a lockdep helper to validate that we indeed are the owner
      of a lock.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f607c668
  28. 24 7月, 2009 2 次提交
    • P
      lockdep: BFS cleanup · af012961
      Peter Zijlstra 提交于
      Some cleanups of the lockdep code after the BFS series:
      
       - Remove the last traces of the generation id
       - Fixup comment style
       - Move the bfs routines into lockdep.c
       - Cleanup the bfs routines
      
      [ tom.leiming@gmail.com: Fix crash ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1246201486-7308-11-git-send-email-tom.leiming@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af012961
    • M
      lockdep: Print the shortest dependency chain if finding a circle · c94aa5ca
      Ming Lei 提交于
      Currently lockdep will print the 1st circle detected if it
      exists when acquiring a new (next) lock.
      
      This patch prints the shortest path from the next lock to be
      acquired to the previous held lock if a circle is found.
      
      The patch still uses the current method to check circle, and
      once the circle is found, breadth-first search algorithem is
      used to compute the shortest path from the next lock to the
      previous lock in the forward lock dependency graph.
      
      Printing the shortest path will shorten the dependency chain,
      and make troubleshooting for possible circular locking easier.
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1246201486-7308-2-git-send-email-tom.leiming@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c94aa5ca
  29. 23 6月, 2009 1 次提交
    • J
      vfs: Set special lockdep map for dirs only if not set by fs · 9a7aa12f
      Jan Kara 提交于
      Some filesystems need to set lockdep map for i_mutex differently for
      different directories. For example OCFS2 has system directories (for
      orphan inode tracking and for gathering all system files like journal
      or quota files into a single place) which have different locking
      locking rules than standard directories. For a filesystem setting
      lockdep map is naturaly done when the inode is read but we have to
      modify unlock_new_inode() not to overwrite the lockdep map the filesystem
      has set.
      
      Acked-by: peterz@infradead.org
      CC: mingo@redhat.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      9a7aa12f
  30. 03 4月, 2009 1 次提交
    • R
      Factor out #ifdefs from kernel/spinlock.c to LOCK_CONTENDED_FLAGS · e8c158bb
      Robin Holt 提交于
      SGI has observed that on large systems, interrupts are not serviced for a
      long period of time when waiting for a rwlock.  The following patch series
      re-enables irqs while waiting for the lock, resembling the code which is
      already there for spinlocks.
      
      I only made the ia64 version, because the patch adds some overhead to the
      fast path.  I assume there is currently no demand to have this for other
      architectures, because the systems are not so large.  Of course, the
      possibility to implement raw_{read|write}_lock_flags for any architecture
      is still there.
      
      This patch:
      
      The new macro LOCK_CONTENDED_FLAGS expands to the correct implementation
      depending on the config options, so that IRQ's are re-enabled when
      possible, but they remain disabled if CONFIG_LOCKDEP is set.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c158bb
  31. 15 2月, 2009 4 次提交
    • P
      lockdep: move state bit definitions around · 9851673b
      Peter Zijlstra 提交于
      For convenience later.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9851673b
    • P
      lockdep: sanitize reclaim bit names · a652d708
      Peter Zijlstra 提交于
      s/HELD_OVER/ENABLED/g
      
      so that its similar to the hard and soft-irq names.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a652d708
    • P
      lockdep: sanitize bit names · 4fc95e86
      Peter Zijlstra 提交于
      s/\(LOCKF\?_ENABLED_[^ ]*\)S\(_READ\)\?\>/\1\2/g
      
      So that the USED_IN and ENABLED have the same names.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4fc95e86
    • N
      lockdep: annotate reclaim context (__GFP_NOFS) · cf40bd16
      Nick Piggin 提交于
      Here is another version, with the incremental patch rolled up, and
      added reclaim context annotation to kswapd, and allocation tracing
      to slab allocators (which may only ever reach the page allocator
      in rare cases, so it is good to put annotations here too).
      
      Haven't tested this version as such, but it should be getting closer
      to merge worthy ;)
      
      --
      After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
      allocation when it should not have been, I thought it might be a good idea to
      try to catch this kind of thing with lockdep.
      
      I coded up a little idea that seems to work. Unfortunately the system has to
      actually be in __GFP_FS page reclaim, then take the lock, before it will mark
      it. But at least that might still be some orders of magnitude more common
      (and more debuggable) than an actual deadlock condition, so we have some
      improvement I hope (the concept is no less complete than discovery of a lock's
      interrupt contexts).
      
      I guess we could even do the same thing with __GFP_IO (normal reclaim), and
      even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
      code paths, so let's start there and see how it goes.
      
      It *seems* to work. I did a quick test.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.28-rc6-00007-ged313489-dirty #26
      ---------------------------------
      inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
      modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      {in-reclaim-W} state was registered at:
        [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
        [<ffffffff80268f71>] lock_acquire+0x91/0xc0
        [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
        [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
        [<ffffffff8020903b>] _stext+0x3b/0x170
        [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
        [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 3929
      hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
      hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
      softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
      softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
      
      other info that might help us debug this:
      1 lock held by modprobe/8526:
       #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      
      stack backtrace:
      Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged313489-dirty #26
      Call Trace:
       [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
       [<ffffffff80266530>] mark_lock+0xaf0/0xca0
       [<ffffffff80266735>] mark_held_locks+0x55/0xc0
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
       [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
       [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff8020903b>] _stext+0x3b/0x170
       [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
       [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
       [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
       [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
       [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf40bd16