1. 08 8月, 2020 8 次提交
  2. 17 7月, 2020 1 次提交
  3. 26 6月, 2020 1 次提交
  4. 18 6月, 2020 1 次提交
  5. 05 6月, 2020 1 次提交
  6. 04 6月, 2020 2 次提交
    • J
      mm/page_alloc: integrate classzone_idx and high_zoneidx · 97a225e6
      Joonsoo Kim 提交于
      classzone_idx is just different name for high_zoneidx now.  So, integrate
      them and add some comment to struct alloc_context in order to reduce
      future confusion about the meaning of this variable.
      
      The accessor, ac_classzone_idx() is also removed since it isn't needed
      after integration.
      
      In addition to integration, this patch also renames high_zoneidx to
      highest_zoneidx since it represents more precise meaning.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ye Xiaolong <xiaolong.ye@intel.com>
      Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97a225e6
    • W
      mm/slub: fix a memory leak in sysfs_slab_add() · dde3c6b7
      Wang Hai 提交于
      syzkaller reports for memory leak when kobject_init_and_add() returns an
      error in the function sysfs_slab_add() [1]
      
      When this happened, the function kobject_put() is not called for the
      corresponding kobject, which potentially leads to memory leak.
      
      This patch fixes the issue by calling kobject_put() even if
      kobject_init_and_add() fails.
      
      [1]
        BUG: memory leak
        unreferenced object 0xffff8880a6d4be88 (size 8):
        comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
        hex dump (first 8 bytes):
          70 69 64 5f 33 00 ff ff                          pid_3...
        backtrace:
           kstrdup+0x35/0x70 mm/util.c:60
           kstrdup_const+0x3d/0x50 mm/util.c:82
           kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
           kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
           kobject_add_varg lib/kobject.c:384 [inline]
           kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
           sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
           __kmem_cache_create+0x50a/0x570 mm/slub.c:4384
           create_cache+0x113/0x1e0 mm/slab_common.c:407
           kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
           kmem_cache_create+0xd/0x10 mm/slab_common.c:564
           create_pid_cachep kernel/pid_namespace.c:54 [inline]
           create_pid_namespace kernel/pid_namespace.c:96 [inline]
           copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
           create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
           unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
           ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
           __do_sys_unshare kernel/fork.c:3037 [inline]
           __se_sys_unshare kernel/fork.c:3035 [inline]
           __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
           do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
      
      Fixes: 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dde3c6b7
  7. 03 6月, 2020 4 次提交
    • Q
      mm/slub: fix stack overruns with SLUB_STATS · a68ee057
      Qian Cai 提交于
      There is no need to copy SLUB_STATS items from root memcg cache to new
      memcg cache copies.  Doing so could result in stack overruns because the
      store function only accepts 0 to clear the stat and returns an error for
      everything else while the show method would print out the whole stat.
      
      Then, the mismatch of the lengths returns from show and store methods
      happens in memcg_propagate_slab_attrs():
      
      	else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
      		buf = mbuf;
      
      max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
      in show_stat() later where a bounch of sprintf() would overrun the stack
      variable.  Fix it by always allocating a page of buffer to be used in
      show_stat() if SLUB_STATS=y which should only be used for debug purpose.
      
        # echo 1 > /sys/kernel/slab/fs_cache/shrink
        BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
        Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251
      
        Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
        Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
        Call Trace:
          number+0x421/0x6e0
          vsnprintf+0x451/0x8e0
          sprintf+0x9e/0xd0
          show_stat+0x124/0x1d0
          alloc_slowpath_show+0x13/0x20
          __kmem_cache_create+0x47a/0x6b0
      
        addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
         process_one_work+0x0/0xb90
      
        this frame has 1 object:
         [32, 72) 'lockdep_map'
      
        Memory state around the buggy address:
         ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
                                                               ^
         ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
         ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ==================================================================
        Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
        Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
        Call Trace:
          __kmem_cache_create+0x6ac/0x6b0
      
      Fixes: 107dab5c ("slub: slub-specific propagation changes")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Glauber Costa <glauber@scylladb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a68ee057
    • C
      slub: remove kmalloc under list_lock from list_slab_objects() V2 · aa456c7a
      Christopher Lameter 提交于
      list_slab_objects() is called when a slab is destroyed and there are
      objects still left to list the objects in the syslog.  This is a pretty
      rare event.
      
      And there it seems we take the list_lock and call kmalloc while holding
      that lock.
      
      Perform the allocation in free_partial() before the list_lock is taken.
      
      Fixes: bbd7d57b ("slub: Potential stack overflow")
      Signed-off-by: NChristopher Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002031721250.1668@www.lameter.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa456c7a
    • C
      slub: Remove userspace notifier for cache add/remove · d7660ce5
      Christoph Lameter 提交于
      I came across some unnecessary uevents once again which reminded me
      this.  The patch seems to be lost in the leaves of the original
      discussion [1], so resending.
      
      [1] https://lore.kernel.org/r/alpine.DEB.2.21.2001281813130.745@www.lameter.com
      
      Kmem caches are internal kernel structures so it is strange that
      userspace notifiers would be needed.  And I am not aware of any use of
      these notifiers.  These notifiers may just exist because in the initial
      slub release the sysfs code was copied from another subsystem.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Koutný <mkoutny@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200423115721.19821-1-mkoutny@suse.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7660ce5
    • D
      mm/slub.c: fix corrupted freechain in deactivate_slab() · 52f23478
      Dongli Zhang 提交于
      The slub_debug is able to fix the corrupted slab freelist/page.
      However, alloc_debug_processing() only checks the validity of current
      and next freepointer during allocation path.  As a result, once some
      objects have their freepointers corrupted, deactivate_slab() may lead to
      page fault.
      
      Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128
      slub_nomerge'.  The test kernel corrupts the freepointer of one free
      object on purpose.  Unfortunately, deactivate_slab() does not detect it
      when iterating the freechain.
      
        BUG: unable to handle page fault for address: 00000000123456f8
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        ... ...
        RIP: 0010:deactivate_slab.isra.92+0xed/0x490
        ... ...
        Call Trace:
         ___slab_alloc+0x536/0x570
         __slab_alloc+0x17/0x30
         __kmalloc+0x1d9/0x200
         ext4_htree_store_dirent+0x30/0xf0
         htree_dirblock_to_tree+0xcb/0x1c0
         ext4_htree_fill_tree+0x1bc/0x2d0
         ext4_readdir+0x54f/0x920
         iterate_dir+0x88/0x190
         __x64_sys_getdents+0xa6/0x140
         do_syscall_64+0x49/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Therefore, this patch adds extra consistency check in deactivate_slab().
      Once an object's freepointer is corrupted, all following objects
      starting at this object are isolated.
      
      [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n]
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Joe Jin <joe.jin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52f23478
  8. 08 5月, 2020 1 次提交
    • W
      mm/slub: fix incorrect interpretation of s->offset · cbfc35a4
      Waiman Long 提交于
      In a couple of places in the slub memory allocator, the code uses
      "s->offset" as a check to see if the free pointer is put right after the
      object.  That check is no longer true with commit 3202fa62 ("slub:
      relocate freelist pointer to middle of object").
      
      As a result, echoing "1" into the validate sysfs file, e.g.  of dentry,
      may cause a bunch of "Freepointer corrupt" error reports like the
      following to appear with the system in panic afterwards.
      
        =============================================================================
        BUG dentry(666:pmcd.service) (Tainted: G    B): Freepointer corrupt
        -----------------------------------------------------------------------------
      
      To fix it, use the check "s->offset == s->inuse" in the new helper
      function freeptr_outside_object() instead.  Also add another helper
      function get_info_end() to return the end of info block (inuse + free
      pointer if not overlapping with object).
      
      Fixes: 3202fa62 ("slub: relocate freelist pointer to middle of object")
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Vitaly Nikolenko <vnik@duasynt.com>
      Cc: Silvio Cesare <silvio.cesare@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Markus Elfring <Markus.Elfring@web.de>
      Cc: Changbin Du <changbin.du@gmail.com>
      Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbfc35a4
  9. 22 4月, 2020 1 次提交
  10. 08 4月, 2020 2 次提交
  11. 03 4月, 2020 4 次提交
  12. 26 3月, 2020 1 次提交
  13. 22 3月, 2020 1 次提交
    • V
      mm, slub: prevent kmalloc_node crashes and memory leaks · 0715e6c5
      Vlastimil Babka 提交于
      Sachin reports [1] a crash in SLUB __slab_alloc():
      
        BUG: Kernel NULL pointer dereference on read at 0x000073b0
        Faulting instruction address: 0xc0000000003d55f4
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
        NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
        REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
        CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
        GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
        GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
        GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
        GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
        GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
        GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
        GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
        GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
        NIP ___slab_alloc+0x1f4/0x760
        LR __slab_alloc+0x34/0x60
        Call Trace:
          ___slab_alloc+0x334/0x760 (unreliable)
          __slab_alloc+0x34/0x60
          __kmalloc_node+0x110/0x490
          kvmalloc_node+0x58/0x110
          mem_cgroup_css_online+0x108/0x270
          online_css+0x48/0xd0
          cgroup_apply_control_enable+0x2ec/0x4d0
          cgroup_mkdir+0x228/0x5f0
          kernfs_iop_mkdir+0x90/0xf0
          vfs_mkdir+0x110/0x230
          do_mkdirat+0xb0/0x1a0
          system_call+0x5c/0x68
      
      This is a PowerPC platform with following NUMA topology:
      
        available: 2 nodes (0-1)
        node 0 cpus:
        node 0 size: 0 MB
        node 0 free: 0 MB
        node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
        node 1 size: 35247 MB
        node 1 free: 30907 MB
        node distances:
        node   0   1
          0:  10  40
          1:  40  10
      
        possible numa nodes: 0-31
      
      This only happens with a mmotm patch "mm/memcontrol.c: allocate
      shrinker_map on appropriate NUMA node" [2] which effectively calls
      kmalloc_node for each possible node.  SLUB however only allocates
      kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
      node_to_mem_node to return such valid node for other nodes since commit
      a561ce00 ("slub: fall back to node_to_mem_node() node if allocating
      on memoryless node").  This is however not true in this configuration
      where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
      thus it contains zeroes and get_partial() ends up accessing
      non-allocated kmem_cache_node.
      
      A related issue was reported by Bharata (originally by Ramachandran) [3]
      where a similar PowerPC configuration, but with mainline kernel without
      patch [2] ends up allocating large amounts of pages by kmalloc-1k
      kmalloc-512.  This seems to have the same underlying issue with
      node_to_mem_node() not behaving as expected, and might probably also
      lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].
      
      This patch should fix both issues by not relying on node_to_mem_node()
      anymore and instead simply falling back to NUMA_NO_NODE, when
      kmalloc_node(node) is attempted for a node that's not online, or has no
      usable memory.  The "usable memory" condition is also changed from
      node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
      the condition that SLUB uses to allocate kmem_cache_node structures.
      The check in get_partial() is removed completely, as the checks in
      ___slab_alloc() are now sufficient to prevent get_partial() being
      reached with an invalid node.
      
      [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
      [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
      [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
      [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/
      
      Fixes: a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Reported-by: NPUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Tested-by: NBharata B Rao <bharata@linux.ibm.com>
      Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.czDebugged-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0715e6c5
  14. 19 3月, 2020 2 次提交
    • L
      mm: slub: be more careful about the double cmpxchg of freelist · 5076190d
      Linus Torvalds 提交于
      This is just a cleanup addition to Jann's fix to properly update the
      transaction ID for the slub slowpath in commit fd4d9c7d ("mm: slub:
      add missing TID bump..").
      
      The transaction ID is what protects us against any concurrent accesses,
      but we should really also make sure to make the 'freelist' comparison
      itself always use the same freelist value that we then used as the new
      next free pointer.
      
      Jann points out that if we do all of this carefully, we could skip the
      transaction ID update for all the paths that only remove entries from
      the lists, and only update the TID when adding entries (to avoid the ABA
      issue with cmpxchg and list handling re-adding a previously seen value).
      
      But this patch just does the "make sure to cmpxchg the same value we
      used" rather than then try to be clever.
      Acked-by: NJann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5076190d
    • J
      mm: slub: add missing TID bump in kmem_cache_alloc_bulk() · fd4d9c7d
      Jann Horn 提交于
      When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu
      freelist of length M, and N > M > 0, it will first remove the M elements
      from the percpu freelist, then call ___slab_alloc() to allocate the next
      element and repopulate the percpu freelist. ___slab_alloc() can re-enable
      IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc()
      to properly commit the freelist head change.
      
      Fix it by unconditionally bumping c->tid when entering the slowpath.
      
      Cc: stable@vger.kernel.org
      Fixes: ebe909e0 ("slub: improve bulk alloc strategy")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd4d9c7d
  15. 01 2月, 2020 1 次提交
  16. 25 1月, 2020 1 次提交
  17. 14 1月, 2020 1 次提交
    • V
      mm, debug_pagealloc: don't rely on static keys too early · 8e57f8ac
      Vlastimil Babka 提交于
      Commit 96a2b03f ("mm, debug_pagelloc: use static keys to enable
      debugging") has introduced a static key to reduce overhead when
      debug_pagealloc is compiled in but not enabled.  It relied on the
      assumption that jump_label_init() is called before parse_early_param()
      as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
      it is safe to enable the static key.
      
      However, it turns out multiple architectures call parse_early_param()
      earlier from their setup_arch().  x86 also calls jump_label_init() even
      earlier, so no issue was found while testing the commit, but same is not
      true for e.g.  ppc64 and s390 where the kernel would not boot with
      debug_pagealloc=on as found by our QA.
      
      To fix this without tricky changes to init code of multiple
      architectures, this patch partially reverts the static key conversion
      from 96a2b03f.  Init-time and non-fastpath calls (such as in arch
      code) of debug_pagealloc_enabled() will again test a simple bool
      variable.  Fastpath mm code is converted to a new
      debug_pagealloc_enabled_static() variant that relies on the static key,
      which is enabled in a well-defined point in mm_init() where it's
      guaranteed that jump_label_init() has been called, regardless of
      architecture.
      
      [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
        Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
      Fixes: 96a2b03f ("mm, debug_pagelloc: use static keys to enable debugging")
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Qian Cai <cai@lca.pw>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e57f8ac
  18. 08 12月, 2019 1 次提交
  19. 01 12月, 2019 3 次提交
    • Y
      mm/slub.c: clean up validate_slab() · dd98afd4
      Yu Zhao 提交于
      The function doesn't need to return any value, and the check can be done
      in one pass.
      
      There is a behavior change: before the patch, we stop at the first invalid
      free object; after the patch, we stop at the first invalid object, free or
      in use.  This shouldn't matter because the original behavior isn't
      intended anyway.
      
      Link: http://lkml.kernel.org/r/20191108193958.205102-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd98afd4
    • Y
      mm/slub.c: update comments · aed68148
      Yu Zhao 提交于
      Slub doesn't use PG_active and PG_error anymore.
      
      Link: http://lkml.kernel.org/r/20191007222023.162256-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aed68148
    • M
      mm: slub: print the offset of fault addresses · e1b70dd1
      Miles Chen 提交于
      With commit ad67b74d ("printk: hash addresses printed with %p"), it
      is a little bit harder to match the fault addresses printed by
      check_bytes_and_report() or slab_pad_check() in the dump because the
      fault addresses may not show up in the dump.
      
      Print the offset of the fault addresses to make it easier to match the
      incorrect poison or padding values in the dump.
      
      Before: We have to search the "63" in the dump.  If we want to get the
      offset of 63, we have to count it from the start of Object dump.
      
          =============================================================
          BUG kmalloc-128 (Not tainted): Poison overwritten
          -------------------------------------------------------------
      
          Disabling lock debugging due to kernel taint
          INFO: 0x00000000570da294-0x00000000570da294.
          First byte 0x63 instead of 0x6b
          ...
          INFO: Object 0x000000006ebb3b9e @offset=14208 fp=0x0000000065862488
          Redzone 00000000a6abccff: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 00000000741c16f0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 0000000061ad278f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 000000000467c1bd: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 000000008812766b: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 000000003d9b8f25: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 0000000000d80c33: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 00000000867b0d90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Object 000000006ebb3b9e: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000005ea59a9f: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000003ef8bddc: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000008190375d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000006df7fb32: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 0000000069474eae: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 0000000008073b7d: 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 00000000b45ae74d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
      
      After: We know the fault address is at @offset=1508, and the Object is
      at @offset=1408, so we know the fault address is at offset=100 within
      the object.
      
          =========================================================
          BUG kmalloc-128 (Not tainted): Poison overwritten
          ---------------------------------------------------------
      
          Disabling lock debugging due to kernel taint
          INFO: 0x00000000638ec1d1-0x00000000638ec1d1 @offset=1508.
          First byte 0x63 instead of 0x6b
          ...
          INFO: Object 0x000000008171818d @offset=1408 fp=0x0000000066dae230
          Redzone 00000000e2697ab6: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 0000000064b6a381: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 00000000e413a234: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 0000000004c1dfeb: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 000000009ad24d42: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 000000002a196a23: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 00000000a7b8468a: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Redzone 0000000088db6da3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
          Object 000000008171818d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000007c4035d4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000004dd281a4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 0000000079121dff: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 00000000756682a9: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 0000000053b7e541: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 0000000091f8d530: 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
          Object 000000009c76035c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
      
      Link: http://lkml.kernel.org/r/20190925140807.20490-1-miles.chen@mediatek.comSigned-off-by: NMiles Chen <miles.chen@mediatek.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1b70dd1
  20. 16 11月, 2019 1 次提交
  21. 15 10月, 2019 2 次提交
    • A
      mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations · 0f181f9f
      Alexander Potapenko 提交于
      slab_alloc_node() already zeroed out the freelist pointer if
      init_on_free was on.  Thibaut Sautereau noticed that the same needs to
      be done for kmem_cache_alloc_bulk(), which performs the allocations
      separately.
      
      kmem_cache_alloc_bulk() is currently used in two places in the kernel,
      so this change is unlikely to have a major performance impact.
      
      SLAB doesn't require a similar change, as auto-initialization makes the
      allocator store the freelist pointers off-slab.
      
      Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
      Fixes: 6471384a ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Reported-by: NThibaut Sautereau <thibaut@sautereau.fr>
      Reported-by: NKees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f181f9f
    • Q
      mm/slub: fix a deadlock in show_slab_objects() · e4f8e513
      Qian Cai 提交于
      A long time ago we fixed a similar deadlock in show_slab_objects() [1].
      However, it is apparently due to the commits like 01fb58bc ("slab:
      remove synchronous synchronize_sched() from memcg cache deactivation
      path") and 03afc0e2 ("slab: get_online_mems for
      kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
      just reading files in /sys/kernel/slab which will generate a lockdep
      splat below.
      
      Since the "mem_hotplug_lock" here is only to obtain a stable online node
      mask while racing with NUMA node hotplug, in the worst case, the results
      may me miscalculated while doing NUMA node hotplug, but they shall be
      corrected by later reads of the same files.
      
        WARNING: possible circular locking dependency detected
        ------------------------------------------------------
        cat/5224 is trying to acquire lock:
        ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
        show_slab_objects+0x94/0x3a8
      
        but task is already holding lock:
        b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (kn->count#45){++++}:
               lock_acquire+0x31c/0x360
               __kernfs_remove+0x290/0x490
               kernfs_remove+0x30/0x44
               sysfs_remove_dir+0x70/0x88
               kobject_del+0x50/0xb0
               sysfs_slab_unlink+0x2c/0x38
               shutdown_cache+0xa0/0xf0
               kmemcg_cache_shutdown_fn+0x1c/0x34
               kmemcg_workfn+0x44/0x64
               process_one_work+0x4f4/0x950
               worker_thread+0x390/0x4bc
               kthread+0x1cc/0x1e8
               ret_from_fork+0x10/0x18
      
        -> #1 (slab_mutex){+.+.}:
               lock_acquire+0x31c/0x360
               __mutex_lock_common+0x16c/0xf78
               mutex_lock_nested+0x40/0x50
               memcg_create_kmem_cache+0x38/0x16c
               memcg_kmem_cache_create_func+0x3c/0x70
               process_one_work+0x4f4/0x950
               worker_thread+0x390/0x4bc
               kthread+0x1cc/0x1e8
               ret_from_fork+0x10/0x18
      
        -> #0 (mem_hotplug_lock.rw_sem){++++}:
               validate_chain+0xd10/0x2bcc
               __lock_acquire+0x7f4/0xb8c
               lock_acquire+0x31c/0x360
               get_online_mems+0x54/0x150
               show_slab_objects+0x94/0x3a8
               total_objects_show+0x28/0x34
               slab_attr_show+0x38/0x54
               sysfs_kf_seq_show+0x198/0x2d4
               kernfs_seq_show+0xa4/0xcc
               seq_read+0x30c/0x8a8
               kernfs_fop_read+0xa8/0x314
               __vfs_read+0x88/0x20c
               vfs_read+0xd8/0x10c
               ksys_read+0xb0/0x120
               __arm64_sys_read+0x54/0x88
               el0_svc_handler+0x170/0x240
               el0_svc+0x8/0xc
      
        other info that might help us debug this:
      
        Chain exists of:
          mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(kn->count#45);
                                       lock(slab_mutex);
                                       lock(kn->count#45);
          lock(mem_hotplug_lock.rw_sem);
      
         *** DEADLOCK ***
      
        3 locks held by cat/5224:
         #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
         #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
         #2: b8ff009693eee398 (kn->count#45){++++}, at:
        kernfs_seq_start+0x44/0xf0
      
        stack backtrace:
        Call trace:
         dump_backtrace+0x0/0x248
         show_stack+0x20/0x2c
         dump_stack+0xd0/0x140
         print_circular_bug+0x368/0x380
         check_noncircular+0x248/0x250
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc
      
      I think it is important to mention that this doesn't expose the
      show_slab_objects to use-after-free.  There is only a single path that
      might really race here and that is the slab hotplug notifier callback
      __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
      doesn't really destroy kmem_cache_node data structures.
      
      [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html
      
      [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
      Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
      Fixes: 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
      Fixes: 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4f8e513