1. 08 8月, 2020 18 次提交
  2. 17 7月, 2020 1 次提交
  3. 26 6月, 2020 1 次提交
  4. 18 6月, 2020 1 次提交
  5. 05 6月, 2020 1 次提交
  6. 04 6月, 2020 2 次提交
    • J
      mm/page_alloc: integrate classzone_idx and high_zoneidx · 97a225e6
      Joonsoo Kim 提交于
      classzone_idx is just different name for high_zoneidx now.  So, integrate
      them and add some comment to struct alloc_context in order to reduce
      future confusion about the meaning of this variable.
      
      The accessor, ac_classzone_idx() is also removed since it isn't needed
      after integration.
      
      In addition to integration, this patch also renames high_zoneidx to
      highest_zoneidx since it represents more precise meaning.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ye Xiaolong <xiaolong.ye@intel.com>
      Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97a225e6
    • W
      mm/slub: fix a memory leak in sysfs_slab_add() · dde3c6b7
      Wang Hai 提交于
      syzkaller reports for memory leak when kobject_init_and_add() returns an
      error in the function sysfs_slab_add() [1]
      
      When this happened, the function kobject_put() is not called for the
      corresponding kobject, which potentially leads to memory leak.
      
      This patch fixes the issue by calling kobject_put() even if
      kobject_init_and_add() fails.
      
      [1]
        BUG: memory leak
        unreferenced object 0xffff8880a6d4be88 (size 8):
        comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
        hex dump (first 8 bytes):
          70 69 64 5f 33 00 ff ff                          pid_3...
        backtrace:
           kstrdup+0x35/0x70 mm/util.c:60
           kstrdup_const+0x3d/0x50 mm/util.c:82
           kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
           kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
           kobject_add_varg lib/kobject.c:384 [inline]
           kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
           sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
           __kmem_cache_create+0x50a/0x570 mm/slub.c:4384
           create_cache+0x113/0x1e0 mm/slab_common.c:407
           kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
           kmem_cache_create+0xd/0x10 mm/slab_common.c:564
           create_pid_cachep kernel/pid_namespace.c:54 [inline]
           create_pid_namespace kernel/pid_namespace.c:96 [inline]
           copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
           create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
           unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
           ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
           __do_sys_unshare kernel/fork.c:3037 [inline]
           __se_sys_unshare kernel/fork.c:3035 [inline]
           __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
           do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
      
      Fixes: 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dde3c6b7
  7. 03 6月, 2020 4 次提交
    • Q
      mm/slub: fix stack overruns with SLUB_STATS · a68ee057
      Qian Cai 提交于
      There is no need to copy SLUB_STATS items from root memcg cache to new
      memcg cache copies.  Doing so could result in stack overruns because the
      store function only accepts 0 to clear the stat and returns an error for
      everything else while the show method would print out the whole stat.
      
      Then, the mismatch of the lengths returns from show and store methods
      happens in memcg_propagate_slab_attrs():
      
      	else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
      		buf = mbuf;
      
      max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
      in show_stat() later where a bounch of sprintf() would overrun the stack
      variable.  Fix it by always allocating a page of buffer to be used in
      show_stat() if SLUB_STATS=y which should only be used for debug purpose.
      
        # echo 1 > /sys/kernel/slab/fs_cache/shrink
        BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
        Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251
      
        Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
        Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
        Call Trace:
          number+0x421/0x6e0
          vsnprintf+0x451/0x8e0
          sprintf+0x9e/0xd0
          show_stat+0x124/0x1d0
          alloc_slowpath_show+0x13/0x20
          __kmem_cache_create+0x47a/0x6b0
      
        addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
         process_one_work+0x0/0xb90
      
        this frame has 1 object:
         [32, 72) 'lockdep_map'
      
        Memory state around the buggy address:
         ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
                                                               ^
         ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
         ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ==================================================================
        Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
        Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
        Call Trace:
          __kmem_cache_create+0x6ac/0x6b0
      
      Fixes: 107dab5c ("slub: slub-specific propagation changes")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Glauber Costa <glauber@scylladb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a68ee057
    • C
      slub: remove kmalloc under list_lock from list_slab_objects() V2 · aa456c7a
      Christopher Lameter 提交于
      list_slab_objects() is called when a slab is destroyed and there are
      objects still left to list the objects in the syslog.  This is a pretty
      rare event.
      
      And there it seems we take the list_lock and call kmalloc while holding
      that lock.
      
      Perform the allocation in free_partial() before the list_lock is taken.
      
      Fixes: bbd7d57b ("slub: Potential stack overflow")
      Signed-off-by: NChristopher Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002031721250.1668@www.lameter.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa456c7a
    • C
      slub: Remove userspace notifier for cache add/remove · d7660ce5
      Christoph Lameter 提交于
      I came across some unnecessary uevents once again which reminded me
      this.  The patch seems to be lost in the leaves of the original
      discussion [1], so resending.
      
      [1] https://lore.kernel.org/r/alpine.DEB.2.21.2001281813130.745@www.lameter.com
      
      Kmem caches are internal kernel structures so it is strange that
      userspace notifiers would be needed.  And I am not aware of any use of
      these notifiers.  These notifiers may just exist because in the initial
      slub release the sysfs code was copied from another subsystem.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Koutný <mkoutny@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200423115721.19821-1-mkoutny@suse.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7660ce5
    • D
      mm/slub.c: fix corrupted freechain in deactivate_slab() · 52f23478
      Dongli Zhang 提交于
      The slub_debug is able to fix the corrupted slab freelist/page.
      However, alloc_debug_processing() only checks the validity of current
      and next freepointer during allocation path.  As a result, once some
      objects have their freepointers corrupted, deactivate_slab() may lead to
      page fault.
      
      Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128
      slub_nomerge'.  The test kernel corrupts the freepointer of one free
      object on purpose.  Unfortunately, deactivate_slab() does not detect it
      when iterating the freechain.
      
        BUG: unable to handle page fault for address: 00000000123456f8
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        ... ...
        RIP: 0010:deactivate_slab.isra.92+0xed/0x490
        ... ...
        Call Trace:
         ___slab_alloc+0x536/0x570
         __slab_alloc+0x17/0x30
         __kmalloc+0x1d9/0x200
         ext4_htree_store_dirent+0x30/0xf0
         htree_dirblock_to_tree+0xcb/0x1c0
         ext4_htree_fill_tree+0x1bc/0x2d0
         ext4_readdir+0x54f/0x920
         iterate_dir+0x88/0x190
         __x64_sys_getdents+0xa6/0x140
         do_syscall_64+0x49/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Therefore, this patch adds extra consistency check in deactivate_slab().
      Once an object's freepointer is corrupted, all following objects
      starting at this object are isolated.
      
      [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n]
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Joe Jin <joe.jin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52f23478
  8. 08 5月, 2020 1 次提交
    • W
      mm/slub: fix incorrect interpretation of s->offset · cbfc35a4
      Waiman Long 提交于
      In a couple of places in the slub memory allocator, the code uses
      "s->offset" as a check to see if the free pointer is put right after the
      object.  That check is no longer true with commit 3202fa62 ("slub:
      relocate freelist pointer to middle of object").
      
      As a result, echoing "1" into the validate sysfs file, e.g.  of dentry,
      may cause a bunch of "Freepointer corrupt" error reports like the
      following to appear with the system in panic afterwards.
      
        =============================================================================
        BUG dentry(666:pmcd.service) (Tainted: G    B): Freepointer corrupt
        -----------------------------------------------------------------------------
      
      To fix it, use the check "s->offset == s->inuse" in the new helper
      function freeptr_outside_object() instead.  Also add another helper
      function get_info_end() to return the end of info block (inuse + free
      pointer if not overlapping with object).
      
      Fixes: 3202fa62 ("slub: relocate freelist pointer to middle of object")
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Vitaly Nikolenko <vnik@duasynt.com>
      Cc: Silvio Cesare <silvio.cesare@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Markus Elfring <Markus.Elfring@web.de>
      Cc: Changbin Du <changbin.du@gmail.com>
      Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbfc35a4
  9. 22 4月, 2020 1 次提交
  10. 08 4月, 2020 2 次提交
  11. 03 4月, 2020 4 次提交
  12. 26 3月, 2020 1 次提交
  13. 22 3月, 2020 1 次提交
    • V
      mm, slub: prevent kmalloc_node crashes and memory leaks · 0715e6c5
      Vlastimil Babka 提交于
      Sachin reports [1] a crash in SLUB __slab_alloc():
      
        BUG: Kernel NULL pointer dereference on read at 0x000073b0
        Faulting instruction address: 0xc0000000003d55f4
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
        NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
        REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
        CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
        GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
        GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
        GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
        GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
        GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
        GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
        GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
        GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
        NIP ___slab_alloc+0x1f4/0x760
        LR __slab_alloc+0x34/0x60
        Call Trace:
          ___slab_alloc+0x334/0x760 (unreliable)
          __slab_alloc+0x34/0x60
          __kmalloc_node+0x110/0x490
          kvmalloc_node+0x58/0x110
          mem_cgroup_css_online+0x108/0x270
          online_css+0x48/0xd0
          cgroup_apply_control_enable+0x2ec/0x4d0
          cgroup_mkdir+0x228/0x5f0
          kernfs_iop_mkdir+0x90/0xf0
          vfs_mkdir+0x110/0x230
          do_mkdirat+0xb0/0x1a0
          system_call+0x5c/0x68
      
      This is a PowerPC platform with following NUMA topology:
      
        available: 2 nodes (0-1)
        node 0 cpus:
        node 0 size: 0 MB
        node 0 free: 0 MB
        node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
        node 1 size: 35247 MB
        node 1 free: 30907 MB
        node distances:
        node   0   1
          0:  10  40
          1:  40  10
      
        possible numa nodes: 0-31
      
      This only happens with a mmotm patch "mm/memcontrol.c: allocate
      shrinker_map on appropriate NUMA node" [2] which effectively calls
      kmalloc_node for each possible node.  SLUB however only allocates
      kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
      node_to_mem_node to return such valid node for other nodes since commit
      a561ce00 ("slub: fall back to node_to_mem_node() node if allocating
      on memoryless node").  This is however not true in this configuration
      where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
      thus it contains zeroes and get_partial() ends up accessing
      non-allocated kmem_cache_node.
      
      A related issue was reported by Bharata (originally by Ramachandran) [3]
      where a similar PowerPC configuration, but with mainline kernel without
      patch [2] ends up allocating large amounts of pages by kmalloc-1k
      kmalloc-512.  This seems to have the same underlying issue with
      node_to_mem_node() not behaving as expected, and might probably also
      lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].
      
      This patch should fix both issues by not relying on node_to_mem_node()
      anymore and instead simply falling back to NUMA_NO_NODE, when
      kmalloc_node(node) is attempted for a node that's not online, or has no
      usable memory.  The "usable memory" condition is also changed from
      node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
      the condition that SLUB uses to allocate kmem_cache_node structures.
      The check in get_partial() is removed completely, as the checks in
      ___slab_alloc() are now sufficient to prevent get_partial() being
      reached with an invalid node.
      
      [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
      [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
      [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
      [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/
      
      Fixes: a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Reported-by: NPUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Tested-by: NBharata B Rao <bharata@linux.ibm.com>
      Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.czDebugged-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0715e6c5
  14. 19 3月, 2020 2 次提交
    • L
      mm: slub: be more careful about the double cmpxchg of freelist · 5076190d
      Linus Torvalds 提交于
      This is just a cleanup addition to Jann's fix to properly update the
      transaction ID for the slub slowpath in commit fd4d9c7d ("mm: slub:
      add missing TID bump..").
      
      The transaction ID is what protects us against any concurrent accesses,
      but we should really also make sure to make the 'freelist' comparison
      itself always use the same freelist value that we then used as the new
      next free pointer.
      
      Jann points out that if we do all of this carefully, we could skip the
      transaction ID update for all the paths that only remove entries from
      the lists, and only update the TID when adding entries (to avoid the ABA
      issue with cmpxchg and list handling re-adding a previously seen value).
      
      But this patch just does the "make sure to cmpxchg the same value we
      used" rather than then try to be clever.
      Acked-by: NJann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5076190d
    • J
      mm: slub: add missing TID bump in kmem_cache_alloc_bulk() · fd4d9c7d
      Jann Horn 提交于
      When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu
      freelist of length M, and N > M > 0, it will first remove the M elements
      from the percpu freelist, then call ___slab_alloc() to allocate the next
      element and repopulate the percpu freelist. ___slab_alloc() can re-enable
      IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc()
      to properly commit the freelist head change.
      
      Fix it by unconditionally bumping c->tid when entering the slowpath.
      
      Cc: stable@vger.kernel.org
      Fixes: ebe909e0 ("slub: improve bulk alloc strategy")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd4d9c7d