1. 24 6月, 2016 1 次提交
  2. 08 6月, 2016 1 次提交
    • P
      locking/qspinlock: Fix spin_unlock_wait() some more · 2c610022
      Peter Zijlstra 提交于
      While this prior commit:
      
        54cf809b ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
      
      ... fixes spin_is_locked() and spin_unlock_wait() for the usage
      in ipc/sem and netfilter, it does not in fact work right for the
      usage in task_work and futex.
      
      So while the 2 locks crossed problem:
      
      	spin_lock(A)		spin_lock(B)
      	if (!spin_is_locked(B)) spin_unlock_wait(A)
      	  foo()			foo();
      
      ... works with the smp_mb() injected by both spin_is_locked() and
      spin_unlock_wait(), this is not sufficient for:
      
      	flag = 1;
      	smp_mb();		spin_lock()
      	spin_unlock_wait()	if (!flag)
      				  // add to lockless list
      	// iterate lockless list
      
      ... because in this scenario, the store from spin_lock() can be delayed
      past the load of flag, uncrossing the variables and loosing the
      guarantee.
      
      This patch reworks spin_is_locked() and spin_unlock_wait() to work in
      both cases by exploiting the observation that while the lock byte
      store can be delayed, the contender must have registered itself
      visibly in other state contained in the word.
      
      It also allows for architectures to override both functions, as PPC
      and ARM64 have an additional issue for which we currently have no
      generic solution.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Giovanni Gherdovich <ggherdovich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <waiman.long@hpe.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org # v4.2 and later
      Fixes: 54cf809b ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2c610022
  3. 03 6月, 2016 1 次提交
  4. 26 5月, 2016 1 次提交
  5. 16 5月, 2016 1 次提交
    • P
      locking/rwsem: Fix down_write_killable() · 04cafed7
      Peter Zijlstra 提交于
      The new signal_pending exit path in __rwsem_down_write_failed_common()
      was fingered as breaking his kernel by Tetsuo Handa.
      
      Upon inspection it was found that there are two things wrong with it;
      
       - it forgets to remove WAITING_BIAS if it leaves the list empty, or
       - it forgets to wake further waiters that were blocked on the now
         removed waiter.
      
      Especially the first issue causes new lock attempts to block and stall
      indefinitely, as the code assumes that pending waiters mean there is
      an owner that will wake when it releases the lock.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Tested-by: NMichal Hocko <mhocko@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Waiman Long <Waiman.Long@hpe.com>
      Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04cafed7
  6. 05 5月, 2016 3 次提交
  7. 28 4月, 2016 1 次提交
  8. 26 4月, 2016 1 次提交
  9. 23 4月, 2016 2 次提交
  10. 22 4月, 2016 1 次提交
    • M
      locking/rwsem: Provide down_write_killable() · 916633a4
      Michal Hocko 提交于
      Now that all the architectures implement the necessary glue code
      we can introduce down_write_killable(). The only difference wrt. regular
      down_write() is that the slow path waits in TASK_KILLABLE state and the
      interruption by the fatal signal is reported as -EINTR to the caller.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
      Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      916633a4
  11. 19 4月, 2016 1 次提交
    • D
      locking/pvqspinlock: Fix division by zero in qstat_read() · 66876595
      Davidlohr Bueso 提交于
      While playing with the qstat statistics (in <debugfs>/qlockstat/) I ran into
      the following splat on a VM when opening pv_hash_hops:
      
        divide error: 0000 [#1] SMP
        ...
        RIP: 0010:[<ffffffff810b61fe>]  [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
        ...
        Call Trace:
          [<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0
          [<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0
          [<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40
          [<ffffffff811937a9>] ? do_mmap+0x449/0x550
          [<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0
          [<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0
          [<ffffffff811d4bb1>] ? vfs_read+0x81/0x120
          [<ffffffff811d5f12>] ? SyS_read+0x42/0xa0
          [<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
      
      Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
      similarly to what the qstat_pv_latency_wake case does, as if nothing
      else, this can come from resetting the statistics, thus having 0 kicks
      should be quite valid in this context.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: NWaiman Long <Waiman.Long@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: waiman.long@hpe.com
      Link: http://lkml.kernel.org/r/1460961103-24953-1-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66876595
  12. 13 4月, 2016 5 次提交
    • M
      locking/rwsem: Introduce basis for down_write_killable() · d4799608
      Michal Hocko 提交于
      Introduce a generic implementation necessary for down_write_killable().
      
      This is a trivial extension of the already existing down_write() call
      which can be interrupted by SIGKILL.  This patch doesn't provide
      down_write_killable() yet because arches have to provide the necessary
      pieces before.
      
      rwsem_down_write_failed() which is a generic slow path for the
      write lock is extended to take a task state and renamed to
      __rwsem_down_write_failed_common(). The return value is either a valid
      semaphore pointer or ERR_PTR(-EINTR).
      
      rwsem_down_write_failed_killable() is exported as a new way to wait for
      the lock and be killable.
      
      For rwsem-spinlock implementation the current __down_write() it updated
      in a similar way as __rwsem_down_write_failed_common() except it doesn't
      need new exports just visible __down_write_killable().
      
      Architectures which are not using the generic rwsem implementation are
      supposed to provide their __down_write_killable() implementation and
      use rwsem_down_write_failed_killable() for the slow path.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
      Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460041951-22347-7-git-send-email-mhocko@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4799608
    • M
      locking/rwsem: Get rid of __down_write_nested() · f8e04d85
      Michal Hocko 提交于
      This is no longer used anywhere and all callers (__down_write()) use
      0 as a subclass. Ditch __down_write_nested() to make the code easier
      to follow.
      
      This shouldn't introduce any functional change.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
      Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f8e04d85
    • D
      locking/lockdep: Deinline register_lock_class(), save 2328 bytes · c003ed92
      Denys Vlasenko 提交于
      This function compiles to 1328 bytes of machine code. Three callsites.
      
      Registering a new lock class is definitely not *that* time-critical to inline it.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460141926-13069-5-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c003ed92
    • D
      locking/locktorture: Fix NULL pointer dereference for cleanup paths · c1c33b92
      Davidlohr Bueso 提交于
      It has been found that paths that invoke cleanups through
      lock_torture_cleanup() can trigger NULL pointer dereferencing
      bugs during the statistics printing phase. This is mainly
      because we should not be calling into statistics before we are
      sure things have been set up correctly.
      
      Specifically, early checks (and the need for handling this in
      the cleanup call) only include parameter checks and basic
      statistics allocation. Once we start write/read kthreads
      we then consider the test as started. As such, update the function
      in question to check for cxt.lwsa writer stats, if not set,
      we either have a bogus parameter or -ENOMEM situation and
      therefore only need to deal with general torture calls.
      Reported-and-tested-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bobby.prani@gmail.com
      Cc: dhowells@redhat.com
      Cc: dipankar@in.ibm.com
      Cc: dvhart@linux.intel.com
      Cc: edumazet@google.com
      Cc: fweisbec@gmail.com
      Cc: jiangshanlai@gmail.com
      Cc: josh@joshtriplett.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: oleg@redhat.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1460476038-27060-2-git-send-email-paulmck@linux.vnet.ibm.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c1c33b92
    • D
      locking/locktorture: Fix deboosting NULL pointer dereference · 1f190931
      Davidlohr Bueso 提交于
      For the case of rtmutex torturing we will randomly call into the
      boost() handler, including upon module exiting when the tasks are
      deboosted before stopping. In such cases the task may or may not have
      already been boosted, and therefore the NULL being explicitly passed
      can occur anywhere. Currently we only assume that the task will is
      at a higher prio, and in consequence, dereference a NULL pointer.
      
      This patch fixes the case of a rmmod locktorture exploding while
      pounding on the rtmutex lock (partial trace):
      
       task: ffff88081026cf80 ti: ffff880816120000 task.ti: ffff880816120000
       RSP: 0018:ffff880816123eb0  EFLAGS: 00010206
       RAX: ffff88081026cf80 RBX: ffff880816bfa630 RCX: 0000000000160d1b
       RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
       RBP: ffff88081026cf80 R08: 000000000000001f R09: ffff88017c20ca80
       R10: 0000000000000000 R11: 000000000048c316 R12: ffffffffa05d1840
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88203f880000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000008 CR3: 0000000001c0a000 CR4: 00000000000406e0
       Stack:
        ffffffffa05d141d ffff880816bfa630 ffffffffa05d1922 ffff88081e70c2c0
        ffff880816bfa630 ffffffff81095fed 0000000000000000 ffffffff8107bf60
        ffff880816bfa630 ffffffff00000000 ffff880800000000 ffff880816123f08
       Call Trace:
        [<ffffffff81095fed>] kthread+0xbd/0xe0
        [<ffffffff815cf40f>] ret_from_fork+0x3f/0x70
      
      This patch ensures that if the random state pointer is not NULL and current
      is not boosted, then do nothing.
      
       RIP: 0010:[<ffffffffa05c6185>]  [<ffffffffa05c6185>] torture_random+0x5/0x60 [torture]
        [<ffffffffa05d141d>] torture_rtmutex_boost+0x1d/0x90 [locktorture]
        [<ffffffffa05d1922>] lock_torture_writer+0xe2/0x170 [locktorture]
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bobby.prani@gmail.com
      Cc: dhowells@redhat.com
      Cc: dipankar@in.ibm.com
      Cc: dvhart@linux.intel.com
      Cc: edumazet@google.com
      Cc: fweisbec@gmail.com
      Cc: jiangshanlai@gmail.com
      Cc: josh@joshtriplett.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: oleg@redhat.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1460476038-27060-1-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f190931
  13. 04 4月, 2016 1 次提交
  14. 31 3月, 2016 1 次提交
  15. 23 3月, 2016 1 次提交
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
  16. 16 3月, 2016 1 次提交
    • P
      tags: Fix DEFINE_PER_CPU expansions · 25528213
      Peter Zijlstra 提交于
      $ make tags
        GEN     tags
      ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
      ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
      ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
      ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
      ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
      ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
      ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
      ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
      ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
      
      Which are all the result of the DEFINE_PER_CPU pattern:
      
        scripts/tags.sh:200:	'/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
        scripts/tags.sh:201:	'/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'
      
      The below cures them. All except the workqueue one are within reasonable
      distance of the 80 char limit. TJ do you have any preference on how to
      fix the wq one, or shall we just not care its too long?
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25528213
  17. 29 2月, 2016 7 次提交
  18. 12 2月, 2016 1 次提交
    • A
      kernel/locking/lockdep.c: convert hash tables to hlists · 4a389810
      Andrew Morton 提交于
      Mike said:
      
      : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.  e
      : kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
      : message.
      :
      : The problem is that ubsan callbacks use spinlocks and might be called
      : before lockdep is initialized.  Particularly this line in the
      : reserve_ebda_region function causes problem:
      :
      : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
      :
      : If i put lockdep_init() before reserve_ebda_region call in
      : x86_64_start_reservations kernel loads well.
      
      Fix this ordering issue permanently: change lockdep so that it uses
      hlists for the hash tables.  Unlike a list_head, an hlist_head is in its
      initialized state when it is all-zeroes, so lockdep is ready for
      operation immediately upon boot - lockdep_init() need not have run.
      
      The patch will also save some memory.
      
      lockdep_init() and lockdep_initialized can be done away with now - a 4.6
      patch has been prepared to do this.
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Suggested-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a389810
  19. 09 2月, 2016 3 次提交
    • A
      locking/lockdep: Eliminate lockdep_init() · 06bea3db
      Andrey Ryabinin 提交于
      Lockdep is initialized at compile time now.  Get rid of lockdep_init().
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Krinkin <krinkin.m.u@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: mm-commits@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      06bea3db
    • A
      locking/lockdep: Convert hash tables to hlists · a63f38cc
      Andrew Morton 提交于
      Mike said:
      
      : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.e.
      : kernel with CONFIG_UBSAN_ALIGNMENT=y fails to load without even any error
      : message.
      :
      : The problem is that ubsan callbacks use spinlocks and might be called
      : before lockdep is initialized.  Particularly this line in the
      : reserve_ebda_region function causes problem:
      :
      : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
      :
      : If i put lockdep_init() before reserve_ebda_region call in
      : x86_64_start_reservations kernel loads well.
      
      Fix this ordering issue permanently: change lockdep so that it uses hlists
      for the hash tables.  Unlike a list_head, an hlist_head is in its
      initialized state when it is all-zeroes, so lockdep is ready for operation
      immediately upon boot - lockdep_init() need not have run.
      
      The patch will also save some memory.
      
      Probably lockdep_init() and lockdep_initialized can be done away with now.
      Suggested-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: mm-commits@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a63f38cc
    • D
      locking/lockdep: Fix stack trace caching logic · 8a5fd564
      Dmitry Vyukov 提交于
      check_prev_add() caches saved stack trace in static trace variable
      to avoid duplicate save_trace() calls in dependencies involving trylocks.
      But that caching logic contains a bug. We may not save trace on first
      iteration due to early return from check_prev_add(). Then on the
      second iteration when we actually need the trace we don't save it
      because we think that we've already saved it.
      
      Let check_prev_add() itself control when stack is saved.
      
      There is another bug. Trace variable is protected by graph lock.
      But we can temporary release graph lock during printing.
      
      Fix this by invalidating cached stack trace when we release graph lock.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: glider@google.com
      Cc: kcc@google.com
      Cc: peter@hurleysoftware.com
      Cc: sasha.levin@oracle.com
      Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a5fd564
  20. 26 1月, 2016 1 次提交
    • T
      rtmutex: Make wait_lock irq safe · b4abf910
      Thomas Gleixner 提交于
      Sasha reported a lockdep splat about a potential deadlock between RCU boosting
      rtmutex and the posix timer it_lock.
      
      CPU0					CPU1
      
      rtmutex_lock(&rcu->rt_mutex)
        spin_lock(&rcu->rt_mutex.wait_lock)
      					local_irq_disable()
      					spin_lock(&timer->it_lock)
      					spin_lock(&rcu->mutex.wait_lock)
      --> Interrupt
          spin_lock(&timer->it_lock)
      
      This is caused by the following code sequence on CPU1
      
           rcu_read_lock()
           x = lookup();
           if (x)
           	spin_lock_irqsave(&x->it_lock);
           rcu_read_unlock();
           return x;
      
      We could fix that in the posix timer code by keeping rcu read locked across
      the spinlocked and irq disabled section, but the above sequence is common and
      there is no reason not to support it.
      
      Taking rt_mutex.wait_lock irq safe prevents the deadlock.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      b4abf910
  21. 18 12月, 2015 1 次提交
  22. 04 12月, 2015 4 次提交
    • W
      locking/pvqspinlock: Queue node adaptive spinning · cd0272fa
      Waiman Long 提交于
      In an overcommitted guest where some vCPUs have to be halted to make
      forward progress in other areas, it is highly likely that a vCPU later
      in the spinlock queue will be spinning while the ones earlier in the
      queue would have been halted. The spinning in the later vCPUs is then
      just a waste of precious CPU cycles because they are not going to
      get the lock soon as the earlier ones have to be woken up and take
      their turn to get the lock.
      
      This patch implements an adaptive spinning mechanism where the vCPU
      will call pv_wait() if the previous vCPU is not running.
      
      Linux kernel builds were run in KVM guest on an 8-socket, 4
      cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
      Haswell-EX system. Both systems are configured to have 32 physical
      CPUs. The kernel build times before and after the patch were:
      
      		    Westmere			Haswell
        Patch		32 vCPUs    48 vCPUs	32 vCPUs    48 vCPUs
        -----		--------    --------    --------    --------
        Before patch   3m02.3s     5m00.2s     1m43.7s     3m03.5s
        After patch    3m03.0s     4m37.5s	 1m43.0s     2m47.2s
      
      For 32 vCPUs, this patch doesn't cause any noticeable change in
      performance. For 48 vCPUs (over-committed), there is about 8%
      performance improvement.
      Signed-off-by: NWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447114167-47185-8-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd0272fa
    • W
      locking/pvqspinlock: Allow limited lock stealing · 1c4941fd
      Waiman Long 提交于
      This patch allows one attempt for the lock waiter to steal the lock
      when entering the PV slowpath. To prevent lock starvation, the pending
      bit will be set by the queue head vCPU when it is in the active lock
      spinning loop to disable any lock stealing attempt.  This helps to
      reduce the performance penalty caused by lock waiter preemption while
      not having much of the downsides of a real unfair lock.
      
      The pv_wait_head() function was renamed as pv_wait_head_or_lock()
      as it was modified to acquire the lock before returning. This is
      necessary because of possible lock stealing attempts from other tasks.
      
      Linux kernel builds were run in KVM guest on an 8-socket, 4
      cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
      Haswell-EX system. Both systems are configured to have 32 physical
      CPUs. The kernel build times before and after the patch were:
      
                          Westmere                    Haswell
        Patch         32 vCPUs    48 vCPUs    32 vCPUs    48 vCPUs
        -----         --------    --------    --------    --------
        Before patch   3m15.6s    10m56.1s     1m44.1s     5m29.1s
        After patch    3m02.3s     5m00.2s     1m43.7s     3m03.5s
      
      For the overcommited case (48 vCPUs), this patch is able to reduce
      kernel build time by more than 54% for Westmere and 44% for Haswell.
      Signed-off-by: NWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447190336-53317-1-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c4941fd
    • W
      locking/pvqspinlock: Collect slowpath lock statistics · 45e898b7
      Waiman Long 提交于
      This patch enables the accumulation of kicking and waiting related
      PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
      option is selected. It also enables the collection of data which
      enable us to calculate the kicking and wakeup latencies which have
      a heavy dependency on the CPUs being used.
      
      The statistical counters are per-cpu variables to minimize the
      performance overhead in their updates. These counters are exported
      via the debugfs filesystem under the qlockstat directory.  When the
      corresponding debugfs files are read, summation and computing of the
      required data are then performed.
      
      The measured latencies for different CPUs are:
      
      	CPU		Wakeup		Kicking
      	---		------		-------
      	Haswell-EX	63.6us		 7.4us
      	Westmere-EX	67.6us		 9.3us
      
      The measured latencies varied a bit from run-to-run. The wakeup
      latency is much higher than the kicking latency.
      
      A sample of statistical counters after system bootup (with vCPU
      overcommit) was:
      
      	pv_hash_hops=1.00
      	pv_kick_unlock=1148
      	pv_kick_wake=1146
      	pv_latency_kick=11040
      	pv_latency_wake=194840
      	pv_spurious_wakeup=7
      	pv_wait_again=4
      	pv_wait_head=23
      	pv_wait_node=1129
      Signed-off-by: NWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447114167-47185-6-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45e898b7
    • P
      locking, sched: Introduce smp_cond_acquire() and use it · b3e0b1b6
      Peter Zijlstra 提交于
      Introduce smp_cond_acquire() which combines a control dependency and a
      read barrier to form acquire semantics.
      
      This primitive has two benefits:
      
       - it documents control dependencies,
       - its typically cheaper than using smp_load_acquire() in a loop.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b3e0b1b6