1. 25 8月, 2019 1 次提交
    • Y
      Revert "kmemleak: allow to coexist with fault injection" · 01d8d08f
      Yang Shi 提交于
      [ Upstream commit df9576def004d2cd5beedc00cb6e8901427634b9 ]
      
      When running ltp's oom test with kmemleak enabled, the below warning was
      triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
      passed in:
      
        WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
        Modules linked in: loop dax_pmem dax_pmem_core ip_tables x_tables xfs virtio_net net_failover virtio_blk failover ata_generic virtio_pci virtio_ring virtio libata
        CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
        RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
        ...
         kmemleak_alloc+0x4e/0xb0
         kmem_cache_alloc+0x2a7/0x3e0
         mempool_alloc_slab+0x2d/0x40
         mempool_alloc+0x118/0x2b0
         bio_alloc_bioset+0x19d/0x350
         get_swap_bio+0x80/0x230
         __swap_writepage+0x5ff/0xb20
      
      The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, however kmemleak
      has __GFP_NOFAIL set all the time due to d9570ee3 ("kmemleak:
      allow to coexist with fault injection").  But, it doesn't make any sense
      to have __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM specified at the same
      time.
      
      According to the discussion on the mailing list, the commit should be
      reverted for short term solution.  Catalin Marinas would follow up with
      a better solution for longer term.
      
      The failure rate of kmemleak metadata allocation may increase in some
      circumstances, but this should be expected side effect.
      
      Link: http://lkml.kernel.org/r/1563299431-111710-1-git-send-email-yang.shi@linux.alibaba.com
      Fixes: d9570ee3 ("kmemleak: allow to coexist with fault injection")
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      01d8d08f
  2. 31 7月, 2019 1 次提交
    • D
      mm/kmemleak.c: fix check for softirq context · 071f2135
      Dmitry Vyukov 提交于
      [ Upstream commit 6ef9056952532c3b746de46aa10d45b4d7797bd8 ]
      
      in_softirq() is a wrong predicate to check if we are in a softirq
      context.  It also returns true if we have BH disabled, so objects are
      falsely stamped with "softirq" comm.  The correct predicate is
      in_serving_softirq().
      
      If user does cat from /sys/kernel/debug/kmemleak previously they would
      see this, which is clearly wrong, this is system call context (see the
      comm):
      
      unreferenced object 0xffff88805bd661c0 (size 64):
        comm "softirq", pid 0, jiffies 4294942959 (age 12.400s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
          00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000007dcb30c>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<0000000007dcb30c>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<0000000007dcb30c>] slab_alloc mm/slab.c:3326 [inline]
          [<0000000007dcb30c>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
          [<00000000969722b7>] kmalloc include/linux/slab.h:547 [inline]
          [<00000000969722b7>] kzalloc include/linux/slab.h:742 [inline]
          [<00000000969722b7>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline]
          [<00000000969722b7>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085
          [<00000000a4134b5f>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475
          [<00000000d20248ad>] do_ip_setsockopt.isra.0+0x19fe/0x1c00 net/ipv4/ip_sockglue.c:957
          [<000000003d367be7>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246
          [<000000003c7c76af>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
          [<000000000c1aeb23>] sock_common_setsockopt+0x3e/0x50 net/core/sock.c:3130
          [<000000000157b92b>] __sys_setsockopt+0x9e/0x120 net/socket.c:2078
          [<00000000a9f3d058>] __do_sys_setsockopt net/socket.c:2089 [inline]
          [<00000000a9f3d058>] __se_sys_setsockopt net/socket.c:2086 [inline]
          [<00000000a9f3d058>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
          [<000000001b8da885>] do_syscall_64+0x7c/0x1a0 arch/x86/entry/common.c:301
          [<00000000ba770c62>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      now they will see this:
      
      unreferenced object 0xffff88805413c800 (size 64):
        comm "syz-executor.4", pid 8960, jiffies 4294994003 (age 14.350s)
        hex dump (first 32 bytes):
          00 7a 8a 57 80 88 ff ff e0 00 00 01 00 00 00 00  .z.W............
          00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000c5d3be64>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<00000000c5d3be64>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000c5d3be64>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000c5d3be64>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
          [<0000000023865be2>] kmalloc include/linux/slab.h:547 [inline]
          [<0000000023865be2>] kzalloc include/linux/slab.h:742 [inline]
          [<0000000023865be2>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline]
          [<0000000023865be2>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085
          [<000000003029a9d4>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475
          [<00000000ccd0a87c>] do_ip_setsockopt.isra.0+0x19fe/0x1c00 net/ipv4/ip_sockglue.c:957
          [<00000000a85a3785>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246
          [<00000000ec13c18d>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
          [<0000000052d748e3>] sock_common_setsockopt+0x3e/0x50 net/core/sock.c:3130
          [<00000000512f1014>] __sys_setsockopt+0x9e/0x120 net/socket.c:2078
          [<00000000181758bc>] __do_sys_setsockopt net/socket.c:2089 [inline]
          [<00000000181758bc>] __se_sys_setsockopt net/socket.c:2086 [inline]
          [<00000000181758bc>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
          [<00000000d4b73623>] do_syscall_64+0x7c/0x1a0 arch/x86/entry/common.c:301
          [<00000000c1098bec>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Link: http://lkml.kernel.org/r/20190517171507.96046-1-dvyukov@gmail.comSigned-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      071f2135
  3. 08 5月, 2019 2 次提交
  4. 05 9月, 2018 1 次提交
  5. 06 4月, 2018 2 次提交
  6. 29 3月, 2018 1 次提交
  7. 01 2月, 2018 1 次提交
  8. 14 1月, 2018 1 次提交
  9. 15 12月, 2017 1 次提交
  10. 30 11月, 2017 1 次提交
  11. 16 11月, 2017 2 次提交
  12. 07 7月, 2017 3 次提交
  13. 01 4月, 2017 1 次提交
  14. 02 3月, 2017 3 次提交
  15. 13 12月, 2016 1 次提交
  16. 12 11月, 2016 1 次提交
  17. 28 10月, 2016 1 次提交
  18. 12 10月, 2016 1 次提交
  19. 29 7月, 2016 1 次提交
  20. 25 6月, 2016 1 次提交
  21. 18 3月, 2016 2 次提交
  22. 15 1月, 2016 1 次提交
    • V
      Revert "gfp: add __GFP_NOACCOUNT" · 20b5c303
      Vladimir Davydov 提交于
      This reverts commit 8f4fc071 ("gfp: add __GFP_NOACCOUNT").
      
      Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
      fragile and difficult to maintain, because there seem to be many more
      allocations that should not be accounted than those that should be.
      Besides, false accounting an allocation might result in much worse
      consequences than not accounting at all, namely increased memory
      consumption due to pinned dead kmem caches.
      
      So it was decided to switch to the white-list policy.  This patch
      reverts bits introducing the black-list policy.  The white-list policy
      will be introduced later in the series.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20b5c303
  23. 06 11月, 2015 1 次提交
  24. 11 9月, 2015 1 次提交
  25. 09 9月, 2015 1 次提交
  26. 25 6月, 2015 6 次提交
    • L
      mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() · 8a8c35fa
      Larry Finger 提交于
      Beginning at commit d52d3997 ("ipv6: Create percpu rt6_info"), the
      following INFO splat is logged:
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        4.1.0-rc7-next-20150612 #1 Not tainted
        -------------------------------
        kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
        other info that might help us debug this:
        rcu_scheduler_active = 1, debug_locks = 0
         3 locks held by systemd/1:
         #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
         #1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
         #2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
        stack backtrace:
        CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
        Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
        Call Trace:
          dump_stack+0x4c/0x6e
          lockdep_rcu_suspicious+0xe7/0x120
          ___might_sleep+0x1d5/0x1f0
          __might_sleep+0x4d/0x90
          kmem_cache_alloc+0x47/0x250
          create_object+0x39/0x2e0
          kmemleak_alloc_percpu+0x61/0xe0
          pcpu_alloc+0x370/0x630
      
      Additional backtrace lines are truncated.  In addition, the above splat
      is followed by several "BUG: sleeping function called from invalid
      context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
      these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
      uses GFP_KERNEL for its allocations, whereas it should follow the gfp
      from its callers.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a8c35fa
    • C
      mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan · 93ada579
      Catalin Marinas 提交于
      The kmemleak memory scanning uses finer grained object->lock spinlocks
      primarily to avoid races with the memory block freeing.  However, the
      pointer lookup in the rb tree requires the kmemleak_lock to be held.
      This is currently done in the find_and_get_object() function for each
      pointer-like location read during scanning.  While this allows a low
      latency on kmemleak_*() callbacks on other CPUs, the memory scanning is
      slower.
      
      This patch moves the kmemleak_lock outside the scan_block() loop,
      acquiring/releasing it only once per scanned memory block.  The
      allow_resched logic is moved outside scan_block() and a new
      scan_large_block() function is implemented which splits large blocks in
      MAX_SCAN_SIZE chunks with cond_resched() calls in-between.  A redundant
      (object->flags & OBJECT_NO_SCAN) check is also removed from
      scan_object().
      
      With this patch, the kmemleak scanning performance is significantly
      improved: at least 50% with lock debugging disabled and over an order of
      magnitude with lock proving enabled (on an arm64 system).
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93ada579
    • C
      mm: kmemleak: avoid deadlock on the kmemleak object insertion error path · 9d5a4c73
      Catalin Marinas 提交于
      While very unlikely (usually kmemleak or sl*b bug), the create_object()
      function in mm/kmemleak.c may fail to insert a newly allocated object into
      the rb tree.  When this happens, kmemleak disables itself and prints
      additional information about the object already found in the rb tree.
      Such printing is done with the parent->lock acquired, however the
      kmemleak_lock is already held.  This is a potential race with the scanning
      thread which acquires object->lock and kmemleak_lock in a
      
      This patch removes the locking around the 'parent' object information
      printing.  Such object cannot be freed or removed from object_tree_root
      and object_list since kmemleak_lock is already held.  There is a very
      small risk that some of the object data is being modified on another CPU
      but the only downside is inconsistent information printing.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d5a4c73
    • C
      mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup() · 5f369f37
      Catalin Marinas 提交于
      The kmemleak_do_cleanup() work thread already waits for the kmemleak_scan
      thread to finish via kthread_stop().  Waiting in kthread_stop() while
      scan_mutex is held may lead to deadlock if kmemleak_scan_thread() also
      waits to acquire for scan_mutex.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f369f37
    • C
      mm: kmemleak: fix delete_object_*() race when called on the same memory block · e781a9ab
      Catalin Marinas 提交于
      Calling delete_object_*() on the same pointer is not a standard use case
      (unless there is a bug in the code calling kmemleak_free()).  However,
      during kmemleak disabling (error or user triggered via /sys), there is a
      potential race between kmemleak_free() calls on a CPU and
      __kmemleak_do_cleanup() on a different CPU.
      
      The current delete_object_*() implementation first performs a look-up
      holding kmemleak_lock, increments the object->use_count and then
      re-acquires kmemleak_lock to remove the object from object_tree_root and
      object_list.
      
      This patch simplifies the delete_object_*() mechanism to both look up
      and remove an object from the object_tree_root and object_list
      atomically (guarded by kmemleak_lock).  This allows safe concurrent
      calls to delete_object_*() on the same pointer without additional
      locking for synchronising the kmemleak_free_enabled flag.
      
      A side effect is a slight improvement in the delete_object_*() performance
      by avoiding acquiring kmemleak_lock twice and incrementing/decrementing
      object->use_count.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e781a9ab
    • C
      mm: kmemleak: allow safe memory scanning during kmemleak disabling · c5f3b1a5
      Catalin Marinas 提交于
      The kmemleak scanning thread can run for minutes.  Callbacks like
      kmemleak_free() are allowed during this time, the race being taken care
      of by the object->lock spinlock.  Such lock also prevents a memory block
      from being freed or unmapped while it is being scanned by blocking the
      kmemleak_free() -> ...  -> __delete_object() function until the lock is
      released in scan_object().
      
      When a kmemleak error occurs (e.g.  it fails to allocate its metadata),
      kmemleak_enabled is set and __delete_object() is no longer called on
      freed objects.  If kmemleak_scan is running at the same time,
      kmemleak_free() no longer waits for the object scanning to complete,
      allowing the corresponding memory block to be freed or unmapped (in the
      case of vfree()).  This leads to kmemleak_scan potentially triggering a
      page fault.
      
      This patch separates the kmemleak_free() enabling/disabling from the
      overall kmemleak_enabled nob so that we can defer the disabling of the
      object freeing tracking until the scanning thread completed.  The
      kmemleak_free_part() is deliberately ignored by this patch since this is
      only called during boot before the scanning thread started.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NVignesh Radhakrishnan <vigneshr@codeaurora.org>
      Tested-by: NVignesh Radhakrishnan <vigneshr@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5f3b1a5
  27. 15 5月, 2015 1 次提交
    • V
      gfp: add __GFP_NOACCOUNT · 8f4fc071
      Vladimir Davydov 提交于
      Not all kmem allocations should be accounted to memcg.  The following
      patch gives an example when accounting of a certain type of allocations to
      memcg can effectively result in a memory leak.  This patch adds the
      __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
      allocation to go through the root cgroup.  It will be used by the next
      patch.
      
      Note, since in case of kmemleak enabled each kmalloc implies yet another
      allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
      gfp_kmemleak_mask.
      
      Alternatively, we could introduce a per kmem cache flag disabling
      accounting for all allocations of a particular kind, but (a) we would not
      be able to bypass accounting for kmalloc then and (b) a kmem cache with
      this flag set could not be merged with a kmem cache without this flag,
      which would increase the number of global caches and therefore
      fragmentation even if the memory cgroup controller is not used.
      
      Despite its generic name, currently __GFP_NOACCOUNT disables accounting
      only for kmem allocations while user page allocations are always charged.
      To catch abusing of this flag, a warning is issued on an attempt of
      passing it to mem_cgroup_try_charge.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4fc071