1. 20 4月, 2019 15 次提交
    • A
      mm/kmemleak.c: fix unused-function warning · dce5b0bd
      Arnd Bergmann 提交于
      The only references outside of the #ifdef have been removed, so now we
      get a warning in non-SMP configurations:
      
        mm/kmemleak.c:1404:13: error: unused function 'scan_large_block' [-Werror,-Wunused-function]
      
      Add a new #ifdef around it.
      
      Link: http://lkml.kernel.org/r/20190416123148.3502045-1-arnd@arndb.de
      Fixes: 298a32b1 ("kmemleak: powerpc: skip scanning holes in the .bss section")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dce5b0bd
    • D
      init: initialize jump labels before command line option parsing · 6041186a
      Dan Williams 提交于
      When a module option, or core kernel argument, toggles a static-key it
      requires jump labels to be initialized early.  While x86, PowerPC, and
      ARM64 arrange for jump_label_init() to be called before parse_args(),
      ARM does not.
      
        Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
        page_alloc_shuffle+0x12c/0x1ac
        static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
        before call to jump_label_init()
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted
        5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
        Hardware name: ARM Integrator/CP (Device Tree)
        [<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18)
        [<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24)
        [<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108)
        [<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c)
        [<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>]
        (page_alloc_shuffle+0x12c/0x1ac)
        [<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48)
        [<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350)
        [<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
      
      Move the fallback call to jump_label_init() to occur before
      parse_args().
      
      The redundant calls to jump_label_init() in other archs are left intact
      in case they have static key toggling use cases that are even earlier
      than option parsing.
      
      Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NGuenter Roeck <groeck@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Russell King <rmk@armlinux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6041186a
    • S
      kernel/watchdog_hld.c: hard lockup message should end with a newline · 8f4a8c12
      Sergey Senozhatsky 提交于
      Separate print_modules() and hard lockup error message.
      
      Before the patch:
      
        NMI watchdog: Watchdog detected hard LOCKUP on cpu 1Modules linked in: nls_cp437
      
      Link: http://lkml.kernel.org/r/20190412062557.2700-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4a8c12
    • M
      kcov: improve CONFIG_ARCH_HAS_KCOV help text · 40453c4f
      Mark Rutland 提交于
      The help text for CONFIG_ARCH_HAS_KCOV is stale, and describes the
      feature as being enabled only for x86_64, when it is now enabled for
      several architectures, including arm, arm64, powerpc, and s390.
      
      Let's remove that stale help text, and update it along the lines of hat
      for ARCH_HAS_FORTIFY_SOURCE, better describing when an architecture
      should select CONFIG_ARCH_HAS_KCOV.
      
      Link: http://lkml.kernel.org/r/20190412102733.5154-1-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40453c4f
    • J
      mm: fix inactive list balancing between NUMA nodes and cgroups · 3b991208
      Johannes Weiner 提交于
      During !CONFIG_CGROUP reclaim, we expand the inactive list size if it's
      thrashing on the node that is about to be reclaimed.  But when cgroups
      are enabled, we suddenly ignore the node scope and use the cgroup scope
      only.  The result is that pressure bleeds between NUMA nodes depending
      on whether cgroups are merely compiled into Linux.  This behavioral
      difference is unexpected and undesirable.
      
      When the refault adaptivity of the inactive list was first introduced,
      there were no statistics at the lruvec level - the intersection of node
      and memcg - so it was better than nothing.
      
      But now that we have that infrastructure, use lruvec_page_state() to
      make the list balancing decision always NUMA aware.
      
      [hannes@cmpxchg.org: fix bisection hole]
        Link: http://lkml.kernel.org/r/20190417155241.GB23013@cmpxchg.org
      Link: http://lkml.kernel.org/r/20190412144438.2645-1-hannes@cmpxchg.org
      Fixes: 2a2e4885 ("mm: vmscan: fix IO/refault regression in cache workingset transition")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b991208
    • Q
      mm/hotplug: treat CMA pages as unmovable · 1a9f2191
      Qian Cai 提交于
      has_unmovable_pages() is used by allocating CMA and gigantic pages as
      well as the memory hotplug.  The later doesn't know how to offline CMA
      pool properly now, but if an unused (free) CMA page is encountered, then
      has_unmovable_pages() happily considers it as a free memory and
      propagates this up the call chain.  Memory offlining code then frees the
      page without a proper CMA tear down which leads to an accounting issues.
      Moreover if the same memory range is onlined again then the memory never
      gets back to the CMA pool.
      
      State after memory offline:
      
       # grep cma /proc/vmstat
       nr_free_cma 205824
      
       # cat /sys/kernel/debug/cma/cma-kvm_cma/count
       209920
      
      Also, kmemleak still think those memory address are reserved below but
      have already been used by the buddy allocator after onlining.  This
      patch fixes the situation by treating CMA pageblocks as unmovable except
      when has_unmovable_pages() is called as part of CMA allocation.
      
        Offlined Pages 4096
        kmemleak: Cannot insert 0xc000201f7d040008 into the object search tree (overlaps existing)
        Call Trace:
          dump_stack+0xb0/0xf4 (unreliable)
          create_object+0x344/0x380
          __kmalloc_node+0x3ec/0x860
          kvmalloc_node+0x58/0x110
          seq_read+0x41c/0x620
          __vfs_read+0x3c/0x70
          vfs_read+0xbc/0x1a0
          ksys_read+0x7c/0x140
          system_call+0x5c/0x70
        kmemleak: Kernel memory leak detector disabled
        kmemleak: Object 0xc000201cc8000000 (size 13757317120):
        kmemleak:   comm "swapper/0", pid 0, jiffies 4294937297
        kmemleak:   min_count = -1
        kmemleak:   count = 0
        kmemleak:   flags = 0x5
        kmemleak:   checksum = 0
        kmemleak:   backtrace:
             cma_declare_contiguous+0x2a4/0x3b0
             kvm_cma_reserve+0x11c/0x134
             setup_arch+0x300/0x3f8
             start_kernel+0x9c/0x6e8
             start_here_common+0x1c/0x4b0
        kmemleak: Automatic memory scanning thread ended
      
      [cai@lca.pw: use is_migrate_cma_page() and update commit log]
        Link: http://lkml.kernel.org/r/20190416170510.20048-1-cai@lca.pw
      Link: http://lkml.kernel.org/r/20190413002623.8967-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a9f2191
    • A
      proc: fixup proc-pid-vm test · 68545aa1
      Alexey Dobriyan 提交于
      Silly sizeof(pointer) vs sizeof(uint8_t[]) bug.
      
      Link: http://lkml.kernel.org/r/20190414123009.GA12971@avx2
      Fixes: e483b020 ("proc: test /proc/*/maps, smaps, smaps_rollup, statm")
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68545aa1
    • A
      proc: fix map_files test on F29 · 8cd40d1d
      Alexey Dobriyan 提交于
      F29 bans mapping first 64KB even for root making test fail.  Iterate
      from address 0 until mmap() works.
      
      Gentoo (root):
      
      	openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
      	mmap(NULL, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0
      
      Gentoo (non-root):
      
      	openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
      	mmap(NULL, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EPERM (Operation not permitted)
      	mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x1000
      
      F29 (root):
      
      	openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
      	mmap(NULL, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x2000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x3000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x4000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x5000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x6000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x7000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x8000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x9000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0xa000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0xb000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0xc000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0xd000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0xe000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0xf000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied)
      	mmap(0x10000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x10000
      
      Now all proc tests succeed on F29 if run as root, at last!
      
      Link: http://lkml.kernel.org/r/20190414123612.GB12971@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8cd40d1d
    • K
      mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n · e8277b3b
      Konstantin Khlebnikov 提交于
      Commit 58bc4c34 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
      depends on skipping vmstat entries with empty name introduced in
      7aaf7727 ("mm: don't show nr_indirectly_reclaimable in
      /proc/vmstat") but reverted in b29940c1 ("mm: rename and change
      semantics of nr_indirectly_reclaimable_bytes").
      
      So skipping no longer works and /proc/vmstat has misformatted lines " 0".
      
      This patch simply shows debug counters "nr_tlb_remote_*" for UP.
      
      Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz
      Fixes: 58bc4c34 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8277b3b
    • Z
      mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock · 37803841
      zhong jiang 提交于
      When adding memory by probing a memory block in the sysfs interface,
      there is an obvious issue where we will unlock the device_hotplug_lock
      when we failed to takes it.
      
      That issue was introduced in 8df1d0e4 ("mm/memory_hotplug: make
      add_memory() take the device_hotplug_lock").
      
      We should drop out in time when failing to take the device_hotplug_lock.
      
      Link: http://lkml.kernel.org/r/1554696437-9593-1-git-send-email-zhongjiang@huawei.com
      Fixes: 8df1d0e4 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
      Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Reported-by: NYang yingliang <yangyingliang@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37803841
    • H
      mm: swapoff: shmem_unuse() stop eviction without igrab() · af53d3e9
      Hugh Dickins 提交于
      The igrab() in shmem_unuse() looks good, but we forgot that it gives no
      protection against concurrent unmounting: a point made by Konstantin
      Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893
      ("tmpfs: fix race between umount and swapoff").  The current 5.1-rc
      swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs.
      Self-destruct in 5 seconds.  Have a nice day..." followed by GPF.
      
      Once again, give up on using igrab(); but don't go back to making such
      heavy-handed use of shmem_swaplist_mutex as last time: that would spoil
      the new design, and I expect could deadlock inside shmem_swapin_page().
      
      Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem-
      specific inode, and shmem_evict_inode() wait for that to go down to 0.
      Call it "stop_eviction" rather than "swapoff_busy" because it can be put
      to use for others later (huge tmpfs patches expect to use it).
      
      That simplifies shmem_unuse(), protecting it from both unlink and
      unmount; and in practice lets it locate all the swap in its first try.
      But do not rely on that: there's still a theoretical case, when
      shmem_writepage() might have been preempted after its get_swap_page(),
      before making the swap entry visible to swapoff.
      
      [hughd@google.com: remove incorrect list_del()]
        Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af53d3e9
    • H
      mm: swapoff: take notice of completion sooner · 64165b1a
      Hugh Dickins 提交于
      The old try_to_unuse() implementation was driven by find_next_to_unuse(),
      which terminated as soon as all the swap had been freed.
      
      Add inuse_pages checks now (alongside signal_pending()) to stop scanning
      mms and swap_map once finished.
      
      The same ought to be done in shmem_unuse() too, but never was before,
      and needs a different interface: so leave it as is for now.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081258200.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64165b1a
    • H
      mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES · dd862deb
      Hugh Dickins 提交于
      SWAP_UNUSE_MAX_TRIES 3 appeared to work well in earlier testing, but
      further testing has proved it to be a source of unnecessary swapoff
      EBUSY failures (which can then be followed by unmount EBUSY failures).
      
      When mmget_not_zero() or shmem's igrab() fails, there is an mm exiting
      or inode being evicted, freeing up swap independent of try_to_unuse().
      Those typically completed much sooner than the old quadratic swapoff,
      but now it's more common that swapoff may need to wait for them.
      
      It's possible to move those cases from init_mm.mmlist and shmem_swaplist
      to separate "exiting" swaplists, and try_to_unuse() then wait for those
      lists to be emptied; but we've not bothered with that in the past, and
      don't want to risk missing some other forgotten case.  So just revert to
      cycling around until the swap is gone, without any retries limit.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081256170.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd862deb
    • H
      mm: swapoff: shmem_find_swap_entries() filter out other types · 87039546
      Hugh Dickins 提交于
      Swapfile "type" was passed all the way down to shmem_unuse_inode(), but
      then forgotten from shmem_find_swap_entries(): with the result that
      removing one swapfile would try to free up all the swap from shmem - no
      problem when only one swapfile anyway, but counter-productive when more,
      causing swapoff to be unnecessarily OOM-killed when it should succeed.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081254470.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87039546
    • Q
      slab: store tagged freelist for off-slab slabmgmt · 1a62b18d
      Qian Cai 提交于
      Commit 51dedad0 ("kasan, slab: make freelist stored without tags")
      calls kasan_reset_tag() for off-slab slab management object leading to
      freelist being stored non-tagged.
      
      However, cache_grow_begin() calls alloc_slabmgmt() which calls
      kmem_cache_alloc_node() assigns a tag for the address and stores it in
      the shadow address.  As the result, it causes endless errors below
      during boot due to drain_freelist() -> slab_destroy() ->
      kasan_slab_free() which compares already untagged freelist against the
      stored tag in the shadow address.
      
      Since off-slab slab management object freelist is such a special case,
      just store it tagged.  Non-off-slab management object freelist is still
      stored untagged which has not been assigned a tag and should not cause
      any other troubles with this inconsistency.
      
        BUG: KASAN: double-free or invalid-free in slab_destroy+0x84/0x88
        Pointer tag: [ff], memory tag: [99]
      
        CPU: 0 PID: 1376 Comm: kworker/0:4 Tainted: G        W 5.1.0-rc3+ #8
        Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.0.6 07/10/2018
        Workqueue: cgroup_destroy css_killed_work_fn
        Call trace:
         print_address_description+0x74/0x2a4
         kasan_report_invalid_free+0x80/0xc0
         __kasan_slab_free+0x204/0x208
         kasan_slab_free+0xc/0x18
         kmem_cache_free+0xe4/0x254
         slab_destroy+0x84/0x88
         drain_freelist+0xd0/0x104
         __kmem_cache_shrink+0x1ac/0x224
         __kmemcg_cache_deactivate+0x1c/0x28
         memcg_deactivate_kmem_caches+0xa0/0xe8
         memcg_offline_kmem+0x8c/0x3d4
         mem_cgroup_css_offline+0x24c/0x290
         css_killed_work_fn+0x154/0x618
         process_one_work+0x9cc/0x183c
         worker_thread+0x9b0/0xe38
         kthread+0x374/0x390
         ret_from_fork+0x10/0x18
      
        Allocated by task 1625:
         __kasan_kmalloc+0x168/0x240
         kasan_slab_alloc+0x18/0x20
         kmem_cache_alloc_node+0x1f8/0x3a0
         cache_grow_begin+0x4fc/0xa24
         cache_alloc_refill+0x2f8/0x3e8
         kmem_cache_alloc+0x1bc/0x3bc
         sock_alloc_inode+0x58/0x334
         alloc_inode+0xb8/0x164
         new_inode_pseudo+0x20/0xec
         sock_alloc+0x74/0x284
         __sock_create+0xb0/0x58c
         sock_create+0x98/0xb8
         __sys_socket+0x60/0x138
         __arm64_sys_socket+0xa4/0x110
         el0_svc_handler+0x2c0/0x47c
         el0_svc+0x8/0xc
      
        Freed by task 1625:
         __kasan_slab_free+0x114/0x208
         kasan_slab_free+0xc/0x18
         kfree+0x1a8/0x1e0
         single_release+0x7c/0x9c
         close_pdeo+0x13c/0x43c
         proc_reg_release+0xec/0x108
         __fput+0x2f8/0x784
         ____fput+0x1c/0x28
         task_work_run+0xc0/0x1b0
         do_notify_resume+0xb44/0x1278
         work_pending+0x8/0x10
      
        The buggy address belongs to the object at ffff809681b89e00
         which belongs to the cache kmalloc-128 of size 128
        The buggy address is located 0 bytes inside of
         128-byte region [ffff809681b89e00, ffff809681b89e80)
        The buggy address belongs to the page:
        page:ffff7fe025a06e00 count:1 mapcount:0 mapping:01ff80082000fb00
        index:0xffff809681b8fe04
        flags: 0x17ffffffc000200(slab)
        raw: 017ffffffc000200 ffff7fe025a06d08 ffff7fe022ef7b88 01ff80082000fb00
        raw: ffff809681b8fe04 ffff809681b80000 00000001000000e0 0000000000000000
        page dumped because: kasan: bad access detected
        page allocated via order 0, migratetype Unmovable, gfp_mask
        0x2420c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE)
         prep_new_page+0x4e0/0x5e0
         get_page_from_freelist+0x4ce8/0x50d4
         __alloc_pages_nodemask+0x738/0x38b8
         cache_grow_begin+0xd8/0xa24
         ____cache_alloc_node+0x14c/0x268
         __kmalloc+0x1c8/0x3fc
         ftrace_free_mem+0x408/0x1284
         ftrace_free_init_mem+0x20/0x28
         kernel_init+0x24/0x548
         ret_from_fork+0x10/0x18
      
        Memory state around the buggy address:
         ffff809681b89c00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
         ffff809681b89d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
        >ffff809681b89e00: 99 99 99 99 99 99 99 99 fe fe fe fe fe fe fe fe
                           ^
         ffff809681b89f00: 43 43 43 43 43 fe fe fe fe fe fe fe fe fe fe fe
         ffff809681b8a000: 6d fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
      
      Link: http://lkml.kernel.org/r/20190403022858.97584-1-cai@lca.pw
      Fixes: 51dedad0 ("kasan, slab: make freelist stored without tags")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a62b18d
  2. 19 4月, 2019 2 次提交
  3. 18 4月, 2019 10 次提交
    • C
      signal: use fdget() since we don't allow O_PATH · 738a7832
      Christian Brauner 提交于
      As stated in the original commit for pidfd_send_signal() we don't allow
      to signal processes through O_PATH file descriptors since it is
      semantically equivalent to a write on the pidfd.
      
      We already correctly error out right now and return EBADF if an O_PATH
      fd is passed.  This is because we use file->f_op to detect whether a
      pidfd is passed and O_PATH fds have their file->f_op set to empty_fops
      in do_dentry_open() and thus fail the test.
      
      Thus, there is no regression.  It's just semantically correct to use
      fdget() and return an error right from there instead of taking a
      reference and returning an error later.
      Signed-off-by: NChristian Brauner <christian@brauner.io>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jann Horn <jann@thejh.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      738a7832
    • L
      Merge tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · d22113a2
      Linus Torvalds 提交于
      Pull s390 bug fixes from Martin Schwidefsky:
      
       - Fix overwrite of the initial ramdisk due to misuse of IS_ENABLED
      
       - Fix integer overflow in the dasd driver resulting in incorrect number
         of blocks for large devices
      
       - Fix a lockdep false positive in the 3270 driver
      
       - Fix a deadlock in the zcrypt driver
      
       - Fix incorrect debug feature entries in the pkey api
      
       - Fix inline assembly constraints fallout with CONFIG_KASAN=y
      
      * tag 's390-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: correct some inline assembly constraints
        s390/pkey: add one more argument space for debug feature entry
        s390/zcrypt: fix possible deadlock situation on ap queue remove
        s390/3270: fix lockdep false positive on view->lock
        s390/dasd: Fix capacity calculation for large volumes
        s390/mem_detect: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD)
      d22113a2
    • L
      Merge tag 'afs-fixes-20190413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 2a852fd1
      Linus Torvalds 提交于
      Pull AFS fixes from David Howells:
      
       - Stop using the deprecated get_seconds().
      
       - Don't make tracepoint strings const as the section they go in isn't
         read-only.
      
       - Differentiate failure due to unmarshalling from other failure cases.
         We shouldn't abort with RXGEN_CC/SS_UNMARSHAL if it's not due to
         unmarshalling.
      
       - Add a missing unlock_page().
      
       - Fix the interaction between receiving a notification from a server
         that it has invalidated all outstanding callback promises and a
         client call that we're in the middle of making that will get a new
         promise.
      
      * tag 'afs-fixes-20190413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix in-progess ops to ignore server-level callback invalidation
        afs: Unlock pages for __pagevec_release()
        afs: Differentiate abort due to unmarshalling from other errors
        afs: Avoid section confusion in CM_NAME
        afs: avoid deprecated get_seconds()
      2a852fd1
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · d3ce3b18
      Linus Torvalds 提交于
      Pull crypto fix from Herbert Xu:
       "Fix a bug in the implementation of the x86 accelerated version of
        poly1305"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: x86/poly1305 - fix overflow during partial reduction
      d3ce3b18
    • L
      Merge tag 'drm-fixes-2019-04-18' of git://anongit.freedesktop.org/drm/drm · 95ea5529
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Since Easter is looming for me, I'm just pushing whatever is in my
        tree, I'll see what else turns up and maybe I'll send another pull
        early next week if there is anything.
      
        tegra:
         - stream id programming fix
         - avoid divide by 0 for bad hdmi audio setup code
      
        ttm:
         - Hugepages fix
         - refcount imbalance in error path fix
      
        amdgpu:
         - GPU VM fixes for Vega/RV
         - DC AUX fix for active DP-DVI dongles
         - DC fix for multihead regression"
      
      * tag 'drm-fixes-2019-04-18' of git://anongit.freedesktop.org/drm/drm:
        drm/tegra: hdmi: Setup audio only if configured
        drm/amd/display: If one stream full updates, full update all planes
        drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming
        drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR
        gpu: host1x: Program stream ID to bypass without SMMU
        drm/amd/display: extending AUX SW Timeout
        drm/ttm: fix dma_fence refcount imbalance on error path
        drm/ttm: fix incrementing the page pointer for huge pages
        drm/ttm: fix start page for huge page check in ttm_put_pages()
        drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
      95ea5529
    • D
      Merge branch 'drm-fixes-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 00fd14ff
      Dave Airlie 提交于
      - GPUVM fixes for vega/RV and shadow buffers
      - TTM fixes for hugepages
      - TTM fix for refcount imbalance in error path
      - DC AUX fix for some active DP-DVI dongles
      - DC fix for multihead VT switch regression
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190415051703.3377-1-alexander.deucher@amd.com
      00fd14ff
    • D
      Merge tag 'drm/tegra/for-5.1-rc6' of git://anongit.freedesktop.org/tegra/linux into drm-fixes · ce519c1b
      Dave Airlie 提交于
      drm/tegra: Fixes for v5.1-rc6
      
      This contains a follow-up fix for the stream ID programming and a fix
      for a regression on older Tegra devices (Tegra20 and Tegra30) that are
      running into a division by zero trying to enable audio over HDMI.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Thierry Reding <thierry.reding@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190417073525.21680-1-thierry.reding@gmail.com
      ce519c1b
    • L
      Merge tag '5.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · e53f31bf
      Linus Torvalds 提交于
      Pull smb3 fixes from Steve French:
       "Five small SMB3 fixes, all also for stable - an important fix for an
        oplock (lease) bug, a handle leak, and three bugs spotted by KASAN"
      
      * tag '5.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: keep FileInfo handle live during oplock break
        cifs: fix handle leak in smb2_query_symlink()
        cifs: Fix lease buffer length error
        cifs: Fix use-after-free in SMB2_read
        cifs: Fix use-after-free in SMB2_write
      e53f31bf
    • L
      Merge tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi · fe5cdef2
      Linus Torvalds 提交于
      Pull IPMI fixes from Corey Minyard:
       "Fixes for some bugs cause by recent changes. One crash if you feed bad
        data to the module parameters, one BUG that sometimes occurs when a
        user closes the connection, and one bug that cause the driver to not
        work if the configuration information only comes in from SMBIOS"
      
      * tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi:
        ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
        ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
        ipmi: Fix failure on SMBIOS specified devices
      fe5cdef2
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 2a3a028f
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Handle init flow failures properly in iwlwifi driver, from Shahar S
          Matityahu.
      
       2) mac80211 TXQs need to be unscheduled on powersave start, from Felix
          Fietkau.
      
       3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau.
      
       4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed.
      
       5) Avoid checksum complete with XDP in mlx5, also from Saeed.
      
       6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon.
      
       7) Partial sent TLS record leak fix from Jakub Kicinski.
      
       8) Reject zero size iova range in vhost, from Jason Wang.
      
       9) Allow pending work to complete before clcsock release from Karsten
          Graul.
      
      10) Fix XDP handling max MTU in thunderx, from Matteo Croce.
      
      11) A lot of protocols look at the sa_family field of a sockaddr before
          validating it's length is large enough, from Tetsuo Handa.
      
      12) Don't write to free'd pointer in qede ptp error path, from Colin Ian
          King.
      
      13) Have to recompile IP options in ipv4_link_failure because it can be
          invoked from ARP, from Stephen Suryaputra.
      
      14) Doorbell handling fixes in qed from Denis Bolotin.
      
      15) Revert net-sysfs kobject register leak fix, it causes new problems.
          From Wang Hai.
      
      16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva.
      
      17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay
          Aleksandrov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits)
        socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW
        tcp: tcp_grow_window() needs to respect tcp_space()
        ocelot: Clean up stats update deferred work
        ocelot: Don't sleep in atomic context (irqs_disabled())
        net: bridge: fix netlink export of vlan_stats_per_port option
        qed: fix spelling mistake "faspath" -> "fastpath"
        tipc: set sysctl_tipc_rmem and named_timeout right range
        tipc: fix link established but not in session
        net: Fix missing meta data in skb with vlan packet
        net: atm: Fix potential Spectre v1 vulnerabilities
        net/core: work around section mismatch warning for ptp_classifier
        net: bridge: fix per-port af_packet sockets
        bnx2x: fix spelling mistake "dicline" -> "decline"
        route: Avoid crash from dereferencing NULL rt->from
        MAINTAINERS: normalize Woojung Huh's email address
        bonding: fix event handling for stacked bonds
        Revert "net-sysfs: Fix memory leak in netdev_register_kobject"
        rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check
        qed: Fix the DORQ's attentions handling
        qed: Fix missing DORQ attentions
        ...
      2a3a028f
  4. 17 4月, 2019 13 次提交