1. 04 5月, 2017 3 次提交
    • M
      mm: introduce memalloc_nofs_{save,restore} API · 7dea19f9
      Michal Hocko 提交于
      GFP_NOFS context is used for the following 5 reasons currently:
      
       - to prevent from deadlocks when the lock held by the allocation
         context would be needed during the memory reclaim
      
       - to prevent from stack overflows during the reclaim because the
         allocation is performed from a deep context already
      
       - to prevent lockups when the allocation context depends on other
         reclaimers to make a forward progress indirectly
      
       - just in case because this would be safe from the fs POV
      
       - silence lockdep false positives
      
      Unfortunately overuse of this allocation context brings some problems to
      the MM.  Memory reclaim is much weaker (especially during heavy FS
      metadata workloads), OOM killer cannot be invoked because the MM layer
      doesn't have enough information about how much memory is freeable by the
      FS layer.
      
      In many cases it is far from clear why the weaker context is even used
      and so it might be used unnecessarily.  We would like to get rid of
      those as much as possible.  One way to do that is to use the flag in
      scopes rather than isolated cases.  Such a scope is declared when really
      necessary, tracked per task and all the allocation requests from within
      the context will simply inherit the GFP_NOFS semantic.
      
      Not only this is easier to understand and maintain because there are
      much less problematic contexts than specific allocation requests, this
      also helps code paths where FS layer interacts with other layers (e.g.
      crypto, security modules, MM etc...) and there is no easy way to convey
      the allocation context between the layers.
      
      Introduce memalloc_nofs_{save,restore} API to control the scope of
      GFP_NOFS allocation context.  This is basically copying
      memalloc_noio_{save,restore} API we have for other restricted allocation
      context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
      just an alias for PF_FSTRANS which has been xfs specific until recently.
      There are no more PF_FSTRANS users anymore so let's just drop it.
      
      PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
      implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
      is renamed to current_gfp_context because it now cares about both
      PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
      their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
      anymore.
      
      This patch shouldn't introduce any functional changes.
      
      Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
      usage as much as possible and only use a properly documented
      memalloc_nofs_{save,restore} checkpoints where they are appropriate.
      
      [akpm@linux-foundation.org: fix comment typo, reflow comment]
      Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dea19f9
    • M
      lockdep: allow to disable reclaim lockup detection · 7e784422
      Michal Hocko 提交于
      The current implementation of the reclaim lockup detection can lead to
      false positives and those even happen and usually lead to tweak the code
      to silence the lockdep by using GFP_NOFS even though the context can use
      __GFP_FS just fine.
      
      See
      
        http://lkml.kernel.org/r/20160512080321.GA18496@dastard
      
      as an example.
      
        =================================
        [ INFO: inconsistent lock state ]
        4.5.0-rc2+ #4 Tainted: G           O
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:
      
        (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]
      
        {RECLAIM_FS-ON-R} state was registered at:
          mark_held_locks+0x79/0xa0
          lockdep_trace_alloc+0xb3/0x100
          kmem_cache_alloc+0x33/0x230
          kmem_zone_alloc+0x81/0x120 [xfs]
          xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
          __xfs_refcount_find_shared+0x75/0x580 [xfs]
          xfs_refcount_find_shared+0x84/0xb0 [xfs]
          xfs_getbmap+0x608/0x8c0 [xfs]
          xfs_vn_fiemap+0xab/0xc0 [xfs]
          do_vfs_ioctl+0x498/0x670
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
               CPU0
               ----
          lock(&xfs_nondir_ilock_class);
          <Interrupt>
            lock(&xfs_nondir_ilock_class);
      
         *** DEADLOCK ***
      
        3 locks held by kswapd0/543:
      
        stack backtrace:
        CPU: 0 PID: 543 Comm: kswapd0 Tainted: G           O    4.5.0-rc2+ #4
        Call Trace:
         lock_acquire+0xd8/0x1e0
         down_write_nested+0x5e/0xc0
         xfs_ilock+0x177/0x200 [xfs]
         xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
         xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
         evict+0xc5/0x190
         dispose_list+0x39/0x60
         prune_icache_sb+0x4b/0x60
         super_cache_scan+0x14f/0x1a0
         shrink_slab.part.63.constprop.79+0x1e9/0x4e0
         shrink_zone+0x15e/0x170
         kswapd+0x4f1/0xa80
         kthread+0xf2/0x110
         ret_from_fork+0x3f/0x70
      
      To quote Dave:
       "Ignoring whether reflink should be doing anything or not, that's a
        "xfs_refcountbt_init_cursor() gets called both outside and inside
        transactions" lockdep false positive case. The problem here is lockdep
        has seen this allocation from within a transaction, hence a GFP_NOFS
        allocation, and now it's seeing it in a GFP_KERNEL context. Also note
        that we have an active reference to this inode.
      
        So, because the reclaim annotations overload the interrupt level
        detections and it's seen the inode ilock been taken in reclaim
        ("interrupt") context, this triggers a reclaim context warning where
        it thinks it is unsafe to do this allocation in GFP_KERNEL context
        holding the inode ilock..."
      
      This sounds like a fundamental problem of the reclaim lock detection.
      It is really impossible to annotate such a special usecase IMHO unless
      the reclaim lockup detection is reworked completely.  Until then it is
      much better to provide a way to add "I know what I am doing flag" and
      mark problematic places.  This would prevent from abusing GFP_NOFS flag
      which has a runtime effect even on configurations which have lockdep
      disabled.
      
      Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
      skip the current allocation request.
      
      While we are at it also make sure that the radix tree doesn't
      accidentaly override tags stored in the upper part of the gfp_mask.
      
      Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e784422
    • N
      lockdep: teach lockdep about memalloc_noio_save · 6d7225f0
      Nikolay Borisov 提交于
      Patch series "scope GFP_NOFS api", v5.
      
      This patch (of 7):
      
      Commit 21caf2fc ("mm: teach mm by current context info to not do I/O
      during memory allocation") added the memalloc_noio_(save|restore)
      functions to enable people to modify the MM behavior by disabling I/O
      during memory allocation.
      
      This was further extended in commit 934f3072 ("mm: clear __GFP_FS
      when PF_MEMALLOC_NOIO is set").
      
      memalloc_noio_* functions prevent allocation paths recursing back into
      the filesystem without explicitly changing the flags for every
      allocation site.
      
      However, lockdep hasn't been keeping up with the changes and it entirely
      misses handling the memalloc_noio adjustments.  Instead, it is left to
      the callers of __lockdep_trace_alloc to call the function after they
      have shaven the respective GFP flags which can lead to false positives:
      
        =================================
         [ INFO: inconsistent lock state ]
         4.10.0-nbor #134 Not tainted
         ---------------------------------
         inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
         fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes:
          (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
         {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x62a/0x17c0
           lock_acquire+0xc5/0x220
           down_write_nested+0x4f/0x90
           xfs_ilock+0x141/0x230
           xfs_reclaim_inode+0x12a/0x320
           xfs_reclaim_inodes_ag+0x2c8/0x4e0
           xfs_reclaim_inodes_nr+0x33/0x40
           xfs_fs_free_cached_objects+0x19/0x20
           super_cache_scan+0x191/0x1a0
           shrink_slab+0x26f/0x5f0
           shrink_node+0xf9/0x2f0
           kswapd+0x356/0x920
           kthread+0x10c/0x140
           ret_from_fork+0x31/0x40
         irq event stamp: 173777
         hardirqs last  enabled at (173777): __local_bh_enable_ip+0x70/0xc0
         hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0
         softirqs last  enabled at (173776): _xfs_buf_find+0x67a/0xb70
         softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&xfs_nondir_ilock_class);
           <Interrupt>
             lock(&xfs_nondir_ilock_class);
      
          *** DEADLOCK ***
      
         4 locks held by fsstress/3365:
          #0:  (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50
          #1:  (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0
          #2:  (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140
          #3:  (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
      
         stack backtrace:
         CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
         Call Trace:
          kmem_cache_alloc_node_trace+0x3a/0x2c0
          vm_map_ram+0x2a1/0x510
          _xfs_buf_map_pages+0x77/0x140
          xfs_buf_get_map+0x185/0x2a0
          xfs_attr_rmtval_set+0x233/0x430
          xfs_attr_leaf_addname+0x2d2/0x500
          xfs_attr_set+0x214/0x420
          xfs_xattr_set+0x59/0xb0
          __vfs_setxattr+0x76/0xa0
          __vfs_setxattr_noperm+0x5e/0xf0
          vfs_setxattr+0xae/0xb0
          setxattr+0x15e/0x1a0
          path_setxattr+0x8f/0xc0
          SyS_lsetxattr+0x11/0x20
          entry_SYSCALL_64_fastpath+0x23/0xc6
      
      Let's fix this by making lockdep explicitly do the shaving of respective
      GFP flags.
      
      Fixes: 934f3072 ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set")
      Link: http://lkml.kernel.org/r/20170306131408.9828-2-mhocko@kernel.orgSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d7225f0
  2. 16 3月, 2017 4 次提交
  3. 02 3月, 2017 3 次提交
  4. 10 2月, 2017 1 次提交
  5. 24 1月, 2017 1 次提交
  6. 06 12月, 2016 1 次提交
    • D
      lockdep: Fix report formatting · f943fe0f
      Dmitry Vyukov 提交于
      Since commit:
      
        4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines")
      
      printk() requires KERN_CONT to continue log messages. Lots of printk()
      in lockdep.c and print_ip_sym() don't have it. As the result lockdep
      reports are completely messed up.
      
      Add missing KERN_CONT and inline print_ip_sym() where necessary.
      
      Example of a messed up report:
      
        0-rc5+ #41 Not tainted
        -------------------------------------------------------
        syz-executor0/5036 is trying to acquire lock:
         (
        rtnl_mutex
        ){+.+.+.}
        , at:
        [<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20
        but task is already holding lock:
         (
        &net->packet.sklist_lock
        ){+.+...}
        , at:
        [<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3
         (
        &net->packet.sklist_lock
        +.+...}
        ...
      
      Without this patch all scripts that parse kernel bug reports are broken.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andreyknvl@google.com
      Cc: aryabinin@virtuozzo.com
      Cc: joe@perches.com
      Cc: syzkaller@googlegroups.com
      Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f943fe0f
  7. 30 11月, 2016 1 次提交
  8. 11 11月, 2016 1 次提交
  9. 08 6月, 2016 1 次提交
  10. 05 5月, 2016 1 次提交
    • P
      locking/lockdep, sched/core: Implement a better lock pinning scheme · e7904a28
      Peter Zijlstra 提交于
      The problem with the existing lock pinning is that each pin is of
      value 1; this mean you can simply unpin if you know its pinned,
      without having any extra information.
      
      This scheme generates a random (16 bit) cookie for each pin and
      requires this same cookie to unpin. This means you have to keep the
      cookie in context.
      
      No objsize difference for !LOCKDEP kernels.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e7904a28
  11. 23 4月, 2016 2 次提交
  12. 13 4月, 2016 1 次提交
  13. 04 4月, 2016 1 次提交
  14. 31 3月, 2016 1 次提交
  15. 16 3月, 2016 1 次提交
    • P
      tags: Fix DEFINE_PER_CPU expansions · 25528213
      Peter Zijlstra 提交于
      $ make tags
        GEN     tags
      ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
      ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
      ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
      ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
      ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
      ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
      ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
      ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
      ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
      
      Which are all the result of the DEFINE_PER_CPU pattern:
      
        scripts/tags.sh:200:	'/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
        scripts/tags.sh:201:	'/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'
      
      The below cures them. All except the workqueue one are within reasonable
      distance of the 80 char limit. TJ do you have any preference on how to
      fix the wq one, or shall we just not care its too long?
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25528213
  16. 29 2月, 2016 2 次提交
    • I
      locking/lockdep: Detect chain_key collisions · 9e4e7554
      Ingo Molnar 提交于
      Add detection for chain_key collision under CONFIG_DEBUG_LOCKDEP.
      When a collision is detected the problem is reported and all lock
      debugging is turned off.
      
      Tested using liblockdep and the added tests before and after
      applying the fix, confirming both that the code added for the
      detection correctly reports the problem and that the fix actually
      fixes it.
      
      Tested tweaking lockdep to generate false collisions and
      verified that the problem is reported and that lock debugging is
      turned off.
      
      Also tested with lockdep's test suite after applying the patch:
      
          [    0.000000] Good, all 253 testcases passed! |
      Signed-off-by: NAlfredo Alvarez Fernandez <alfredoalvarezernandez@gmail.com>
      Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: sasha.levin@oracle.com
      Link: http://lkml.kernel.org/r/1455864533-7536-4-git-send-email-alfredoalvarezernandez@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e4e7554
    • A
      locking/lockdep: Prevent chain_key collisions · 5f18ab5c
      Alfredo Alvarez Fernandez 提交于
      The chain_key hashing macro iterate_chain_key(key1, key2) does not
      generate a new different value if both key1 and key2 are 0. In that
      case the generated value is again 0. This can lead to collisions which
      can result in lockdep not detecting deadlocks or circular
      dependencies.
      
      Avoid the problem by using class_idx (1-based) instead of class id
      (0-based) as an input for the hashing macro 'key2' in
      iterate_chain_key(key1, key2).
      
      The use of class id created collisions in cases like the following:
      
      1.- Consider an initial state in which no class has been acquired yet.
      Under these circumstances an AA deadlock will not be detected by
      lockdep:
      
        lock  [key1,key2]->new key  (key1=old chain_key, key2=id)
        --------------------------
        A     [0,0]->0
        A     [0,0]->0 (collision)
      
        The newly generated chain_key collides with the one used before and as
        a result the check for a deadlock is skipped
      
        A simple test using liblockdep and a pthread mutex confirms the
        problem: (omitting stack traces)
      
          new class 0xe15038: 0x7ffc64950f20
          acquire class [0xe15038] 0x7ffc64950f20
          acquire class [0xe15038] 0x7ffc64950f20
          hash chain already cached, key: 0000000000000000 tail class:
          [0xe15038] 0x7ffc64950f20
      
      2.- Consider an ABBA in 2 different tasks and no class yet acquired.
      
        T1 [key1,key2]->new key     T2[key1,key2]->new key
        --                          --
        A [0,0]->0
      
                                    B [0,1]->1
      
        B [0,1]->1  (collision)
      
                                    A
      
      In this case the collision prevents lockdep from creating the new
      dependency A->B. This in turn results in lockdep not detecting the
      circular dependency when T2 acquires A.
      Signed-off-by: NAlfredo Alvarez Fernandez <alfredoalvarezernandez@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: sasha.levin@oracle.com
      Link: http://lkml.kernel.org/r/1455147212-2389-4-git-send-email-alfredoalvarezernandez@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f18ab5c
  17. 12 2月, 2016 1 次提交
    • A
      kernel/locking/lockdep.c: convert hash tables to hlists · 4a389810
      Andrew Morton 提交于
      Mike said:
      
      : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.  e
      : kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
      : message.
      :
      : The problem is that ubsan callbacks use spinlocks and might be called
      : before lockdep is initialized.  Particularly this line in the
      : reserve_ebda_region function causes problem:
      :
      : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
      :
      : If i put lockdep_init() before reserve_ebda_region call in
      : x86_64_start_reservations kernel loads well.
      
      Fix this ordering issue permanently: change lockdep so that it uses
      hlists for the hash tables.  Unlike a list_head, an hlist_head is in its
      initialized state when it is all-zeroes, so lockdep is ready for
      operation immediately upon boot - lockdep_init() need not have run.
      
      The patch will also save some memory.
      
      lockdep_init() and lockdep_initialized can be done away with now - a 4.6
      patch has been prepared to do this.
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Suggested-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a389810
  18. 09 2月, 2016 3 次提交
    • A
      locking/lockdep: Eliminate lockdep_init() · 06bea3db
      Andrey Ryabinin 提交于
      Lockdep is initialized at compile time now.  Get rid of lockdep_init().
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Krinkin <krinkin.m.u@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: mm-commits@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      06bea3db
    • A
      locking/lockdep: Convert hash tables to hlists · a63f38cc
      Andrew Morton 提交于
      Mike said:
      
      : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.e.
      : kernel with CONFIG_UBSAN_ALIGNMENT=y fails to load without even any error
      : message.
      :
      : The problem is that ubsan callbacks use spinlocks and might be called
      : before lockdep is initialized.  Particularly this line in the
      : reserve_ebda_region function causes problem:
      :
      : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
      :
      : If i put lockdep_init() before reserve_ebda_region call in
      : x86_64_start_reservations kernel loads well.
      
      Fix this ordering issue permanently: change lockdep so that it uses hlists
      for the hash tables.  Unlike a list_head, an hlist_head is in its
      initialized state when it is all-zeroes, so lockdep is ready for operation
      immediately upon boot - lockdep_init() need not have run.
      
      The patch will also save some memory.
      
      Probably lockdep_init() and lockdep_initialized can be done away with now.
      Suggested-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: mm-commits@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a63f38cc
    • D
      locking/lockdep: Fix stack trace caching logic · 8a5fd564
      Dmitry Vyukov 提交于
      check_prev_add() caches saved stack trace in static trace variable
      to avoid duplicate save_trace() calls in dependencies involving trylocks.
      But that caching logic contains a bug. We may not save trace on first
      iteration due to early return from check_prev_add(). Then on the
      second iteration when we actually need the trace we don't save it
      because we think that we've already saved it.
      
      Let check_prev_add() itself control when stack is saved.
      
      There is another bug. Trace variable is protected by graph lock.
      But we can temporary release graph lock during printing.
      
      Fix this by invalidating cached stack trace when we release graph lock.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: glider@google.com
      Cc: kcc@google.com
      Cc: peter@hurleysoftware.com
      Cc: sasha.levin@oracle.com
      Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a5fd564
  19. 23 11月, 2015 1 次提交
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
  20. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  21. 23 9月, 2015 1 次提交
  22. 19 6月, 2015 2 次提交
    • P
      lockdep: Implement lock pinning · a24fc60d
      Peter Zijlstra 提交于
      Add a lockdep annotation that WARNs if you 'accidentially' unlock a
      lock.
      
      This is especially helpful for code with callbacks, where the upper
      layer assumes a lock remains taken but a lower layer thinks it maybe
      can drop and reacquire the lock.
      
      By unwittingly breaking up the lock, races can be introduced.
      
      Lock pinning is a lockdep annotation that helps with this, when you
      lockdep_pin_lock() a held lock, any unlock without a
      lockdep_unpin_lock() will produce a WARN. Think of this as a relative
      of lockdep_assert_held(), except you don't only assert its held now,
      but ensure it stays held until you release your assertion.
      
      RFC: a possible alternative API would be something like:
      
        int cookie = lockdep_pin_lock(&foo);
        ...
        lockdep_unpin_lock(&foo, cookie);
      
      Where we pick a random number for the pin_count; this makes it
      impossible to sneak a lock break in without also passing the right
      cookie along.
      
      I've not done this because it ends up generating code for !LOCKDEP,
      esp. if you need to pass the cookie around for some reason.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.906731065@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a24fc60d
    • P
      lockdep: Simplify lock_release() · e0f56fd7
      Peter Zijlstra 提交于
      lock_release() takes this nested argument that's mostly pointless
      these days, remove the implementation but leave the argument a
      rudiment for now.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.840411606@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e0f56fd7
  23. 07 6月, 2015 1 次提交
  24. 03 6月, 2015 1 次提交
    • B
      lockdep: Do not break user-visible string · 92ae1837
      Borislav Petkov 提交于
      Remove the line-break in the user-visible string and add the
      missing space in this error message:
      
        WARNING: lockdep init error! lock-(console_sem).lock was acquiredbefore lockdep_init
      
      Also:
      
        - don't yell, it's just a debug warning
      
        - denote references to function calls with '()'
      
        - standardize the lock name quoting
      
        - and finish the sentence.
      
      The result:
      
        WARNING: lockdep init error: lock '(console_sem).lock' was acquired before lockdep_init().
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150602133827.GD19887@pd.tnic
      [ Added a few more stylistic tweaks to the error message. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      92ae1837
  25. 17 4月, 2015 1 次提交
    • P
      lockdep: Make print_lock() robust against concurrent release · d7bc3197
      Peter Zijlstra 提交于
      During sysrq's show-held-locks command it is possible that
      hlock_class() returns NULL for a given lock. The result is then (after
      the warning):
      
      	|BUG: unable to handle kernel NULL pointer dereference at 0000001c
      	|IP: [<c1088145>] get_usage_chars+0x5/0x100
      	|Call Trace:
      	| [<c1088263>] print_lock_name+0x23/0x60
      	| [<c1576b57>] print_lock+0x5d/0x7e
      	| [<c1088314>] lockdep_print_held_locks+0x74/0xe0
      	| [<c1088652>] debug_show_all_locks+0x132/0x1b0
      	| [<c1315c48>] sysrq_handle_showlocks+0x8/0x10
      
      This *might* happen because the thread on the other CPU drops the lock
      after we are looking ->lockdep_depth and ->held_locks points no longer
      to a lock that is held.
      
      The fix here is to simply ignore it and continue.
      Reported-by: NAndreas Messerschmid <andreas@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d7bc3197
  26. 23 3月, 2015 1 次提交
    • P
      lockdep: Fix the module unload key range freeing logic · 35a9393c
      Peter Zijlstra 提交于
      Module unload calls lockdep_free_key_range(), which removes entries
      from the data structures. Most of the lockdep code OTOH assumes the
      data structures are append only; in specific see the comments in
      add_lock_to_list() and look_up_lock_class().
      
      Clearly this has only worked by accident; make it work proper. The
      actual scenario to make it go boom would involve the memory freed by
      the module unlock being re-allocated and re-used for a lock inside of
      a rcu-sched grace period. This is a very unlikely scenario, still
      better plug the hole.
      
      Use RCU list iteration in all places and ammend the comments.
      
      Change lockdep_free_key_range() to issue a sync_sched() between
      removal from the lists and returning -- which results in the memory
      being freed. Further ensure the callers are placed correctly and
      comment the requirements.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Tsyvarev <tsyvarev@ispras.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      35a9393c
  27. 03 10月, 2014 1 次提交
    • P
      locking/lockdep: Revert qrwlock recusive stuff · 8acd91e8
      Peter Zijlstra 提交于
      Commit f0bab73c ("locking/lockdep: Restrict the use of recursive
      read_lock() with qrwlock") changed lockdep to try and conform to the
      qrwlock semantics which differ from the traditional rwlock semantics.
      
      In particular qrwlock is fair outside of interrupt context, but in
      interrupt context readers will ignore all fairness.
      
      The problem modeling this is that read and write side have different
      lock state (interrupts) semantics but we only have a single
      representation of these. Therefore lockdep will get confused, thinking
      the lock can cause interrupt lock inversions.
      
      So revert it for now; the old rwlock semantics were already imperfectly
      modeled and the qrwlock extra won't fit either.
      
      If we want to properly fix this, I think we need to resurrect the work
      by Gautham did a few years ago that split the read and write state of
      locks:
      
         http://lwn.net/Articles/332801/
      
      FWIW the locking selftest that would've failed (and was reported by
      Borislav earlier) is something like:
      
        RL(X1);	/* IRQ-ON */
        LOCK(A);
        UNLOCK(A);
        RU(X1);
      
        IRQ_ENTER();
        RL(X1);	/* IN-IRQ */
        RU(X1);
        IRQ_EXIT();
      
      At which point it would report that because A is an IRQ-unsafe lock we
      can suffer the following inversion:
      
      	CPU0		CPU1
      
      	lock(A)
      			lock(X1)
      			lock(A)
      	<IRQ>
      	 lock(X1)
      
      And this is 'wrong' because X1 can recurse (assuming the above lock are
      in fact read-lock) but lockdep doesn't know about this.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: ego@linux.vnet.ibm.com
      Cc: bp@alien8.de
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8acd91e8
  28. 13 8月, 2014 1 次提交
    • W
      locking/lockdep: Restrict the use of recursive read_lock() with qrwlock · f0bab73c
      Waiman Long 提交于
      Unlike the original unfair rwlock implementation, queued rwlock
      will grant lock according to the chronological sequence of the lock
      requests except when the lock requester is in the interrupt context.
      Consequently, recursive read_lock calls will now hang the process if
      there is a write_lock call somewhere in between the read_lock calls.
      
      This patch updates the lockdep implementation to look for recursive
      read_lock calls. A new read state (3) is used to mark those read_lock
      call that cannot be recursively called except in the interrupt
      context. The new read state does exhaust the 2 bits available in
      held_lock:read bit field. The addition of any new read state in the
      future may require a redesign of how all those bits are squeezed
      together in the held_lock structure.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1407345722-61615-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0bab73c