1. 19 10月, 2019 19 次提交
    • H
      mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size · b11edebb
      Honglei Wang 提交于
      Commit 1a61ab80 ("mm: memcontrol: replace zone summing with
      lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
      instead of calculating it directly from lru_zone_size with an idea that
      this would be more effective.
      
      Tim has reported that this is not really the case for their database
      benchmark which is showing an opposite results where lruvec_page_state
      is taking up a huge chunk of CPU cycles (about 25% of the system time
      which is roughly 7% of total cpu cycles) on 5.3 kernels.  The workload
      is running on a larger machine (96cpus), it has many cgroups (500) and
      it is heavily direct reclaim bound.
      
      Tim Chen said:
      
      : The problem can also be reproduced by running simple multi-threaded
      : pmbench benchmark with a fast Optane SSD swap (see profile below).
      :
      :
      : 6.15%     3.08%  pmbench          [kernel.vmlinux]            [k] lruvec_lru_size
      :             |
      :             |--3.07%--lruvec_lru_size
      :             |          |
      :             |          |--2.11%--cpumask_next
      :             |          |          |
      :             |          |           --1.66%--find_next_bit
      :             |          |
      :             |           --0.57%--call_function_interrupt
      :             |                     |
      :             |                      --0.55%--smp_call_function_interrupt
      :             |
      :             |--1.59%--0x441f0fc3d009
      :             |          _ops_rdtsc_init_base_freq
      :             |          access_histogram
      :             |          page_fault
      :             |          __do_page_fault
      :             |          handle_mm_fault
      :             |          __handle_mm_fault
      :             |          |
      :             |           --1.54%--do_swap_page
      :             |                     swapin_readahead
      :             |                     swap_cluster_readahead
      :             |                     |
      :             |                      --1.53%--read_swap_cache_async
      :             |                                __read_swap_cache_async
      :             |                                alloc_pages_vma
      :             |                                __alloc_pages_nodemask
      :             |                                __alloc_pages_slowpath
      :             |                                try_to_free_pages
      :             |                                do_try_to_free_pages
      :             |                                shrink_node
      :             |                                shrink_node_memcg
      :             |                                |
      :             |                                |--0.77%--lruvec_lru_size
      :             |                                |
      :             |                                 --0.76%--inactive_list_is_low
      :             |                                           |
      :             |                                            --0.76%--lruvec_lru_size
      :             |
      :              --1.50%--measure_read
      :                        page_fault
      :                        __do_page_fault
      :                        handle_mm_fault
      :                        __handle_mm_fault
      :                        do_swap_page
      :                        swapin_readahead
      :                        swap_cluster_readahead
      :                        |
      :                         --1.48%--read_swap_cache_async
      :                                   __read_swap_cache_async
      :                                   alloc_pages_vma
      :                                   __alloc_pages_nodemask
      :                                   __alloc_pages_slowpath
      :                                   try_to_free_pages
      :                                   do_try_to_free_pages
      :                                   shrink_node
      :                                   shrink_node_memcg
      :                                   |
      :                                   |--0.75%--inactive_list_is_low
      :                                   |          |
      :                                   |           --0.75%--lruvec_lru_size
      :                                   |
      :                                    --0.73%--lruvec_lru_size
      
      The likely culprit is the cache traffic the lruvec_page_state_local
      generates.  Dave Hansen says:
      
      : I was thinking purely of the cache footprint.  If it's reading
      : pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
      : bytes of cache *96 CPUs = 18k of data, mostly read-only.  1 cgroup would
      : be 18k of data for the whole system and the caching would be pretty
      : efficient and all 18k would probably survive a tight page fault loop in
      : the L1.  500 cgroups would be ~90k of data per CPU thread which doesn't
      : fit in the L1 and probably wouldn't survive a tight page fault loop if
      : both logical threads were banging on different cgroups.
      :
      : It's just a theory, but it's why I noted the number of cgroups when I
      : initially saw this show up in profiles
      
      Fix the regression by partially reverting the said commit and calculate
      the lru size explicitly.
      
      Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@oracle.com
      Fixes: 1a61ab80 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
      Signed-off-by: NHonglei Wang <honglei.wang@oracle.com>
      Reported-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
      Tested-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: <stable@vger.kernel.org>	[5.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b11edebb
    • J
      mm/gup: fix a misnamed "write" argument, and a related bug · 0cd22afd
      John Hubbard 提交于
      In several routines, the "flags" argument is incorrectly named "write".
      Change it to "flags".
      
      Also, in one place, the misnaming led to an actual bug:
      "flags & FOLL_WRITE" is required, rather than just "flags".
      (That problem was flagged by krobot, in v1 of this patch.)
      
      Also, change the flags argument from int, to unsigned int.
      
      You can see that this was a simple oversight, because the
      calling code passes "flags" to the fifth argument:
      
      gup_pgd_range():
          ...
          if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
      		    PGDIR_SHIFT, next, flags, pages, nr))
      
      ...which, until this patch, the callees referred to as "write".
      
      Also, change two lines to avoid checkpatch line length
      complaints, and another line to fix another oversight
      that checkpatch called out: missing "int" on pdshift.
      
      Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com
      Fixes: b798bec4 ("mm/gup: change write parameter to flags in fast walk")
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Suggested-by: NIra Weiny <ira.weiny@intel.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cd22afd
    • J
      mm/gup_benchmark: add a missing "w" to getopt string · 6f24c8d3
      John Hubbard 提交于
      Even though gup_benchmark.c has code to handle the -w command-line option,
      the "w" is not part of the getopt string.  It looks as if it has been
      missing the whole time.
      
      On my machine, this leads naturally to the following predictable result:
      
        $ sudo ./gup_benchmark -w
        ./gup_benchmark: invalid option -- 'w'
      
      ...which is fixed with this commit.
      
      Link: http://lkml.kernel.org/r/20191014184639.1512873-2-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: kbuild test robot <lkp@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f24c8d3
    • C
      ocfs2: fix error handling in ocfs2_setattr() · ce750f43
      Chengguang Xu 提交于
      Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before
      jumping to do dqput().
      
      Link: http://lkml.kernel.org/r/20191010082349.1134-1-cgxu519@mykernel.netSigned-off-by: NChengguang Xu <cgxu519@mykernel.net>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce750f43
    • R
      mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release · b749ecfa
      Roman Gushchin 提交于
      Karsten reported the following panic in __free_slab() happening on a s390x
      machine:
      
        Unable to handle kernel pointer dereference in virtual kernel address space
        Failing address: 0000000000000000 TEID: 0000000000000483
        Fault in home space mode while using kernel ASCE.
        AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
        Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
        Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
        Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
        Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
                   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
        Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
                   0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
                   0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
                   0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
                   000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
        Krnl Code: 00000000003cada6: e310a1400004        lg      %r1,320(%r10)
                   00000000003cadac: c0e50046c286        brasl   %r14,ca32b8
                  #00000000003cadb2: a7f4fe36            brc     15,3caa1e
                  >00000000003cadb6: e32060800024        stg     %r2,128(%r6)
                   00000000003cadbc: a7f4fd9e            brc     15,3ca8f8
                   00000000003cadc0: c0e50046790c        brasl   %r14,c99fd8
                   00000000003cadc6: a7f4fe2c            brc     15,3caa
                   00000000003cadc6: a7f4fe2c            brc     15,3caa1e
                   00000000003cadca: ecb1ffff00d9        aghik   %r11,%r1,-1
        Call Trace:
        (<00000000003cabcc> __free_slab+0x49c/0x6b0)
         <00000000001f5886> rcu_core+0x5a6/0x7e0
         <0000000000ca2dea> __do_softirq+0xf2/0x5c0
         <0000000000152644> irq_exit+0x104/0x130
         <000000000010d222> do_IRQ+0x9a/0xf0
         <0000000000ca2344> ext_int_handler+0x130/0x134
         <0000000000103648> enabled_wait+0x58/0x128
        (<0000000000103634> enabled_wait+0x44/0x128)
         <0000000000103b00> arch_cpu_idle+0x40/0x58
         <0000000000ca0544> default_idle_call+0x3c/0x68
         <000000000018eaa4> do_idle+0xec/0x1c0
         <000000000018ee0e> cpu_startup_entry+0x36/0x40
         <000000000122df34> arch_call_rest_init+0x5c/0x88
         <0000000000000000> 0x0
        INFO: lockdep is turned off.
        Last Breaking-Event-Address:
         <00000000003ca8f4> __free_slab+0x1c4/0x6b0
        Kernel panic - not syncing: Fatal exception in interrupt
      
      The kernel panics on an attempt to dereference the NULL memcg pointer.
      When shutdown_cache() is called from the kmem_cache_destroy() context, a
      memcg kmem_cache might have empty slab pages in a partial list, which are
      still charged to the memory cgroup.
      
      These pages are released by free_partial() at the beginning of
      shutdown_cache(): either directly or by scheduling a RCU-delayed work
      (if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag).  The latter case
      is when the reported panic can happen: memcg_unlink_cache() is called
      immediately after shrinking partial lists, without waiting for scheduled
      RCU works.  It sets the kmem_cache->memcg_params.memcg pointer to NULL,
      and the following attempt to dereference it by __free_slab() from the
      RCU work context causes the panic.
      
      To fix the issue, let's postpone the release of the memcg pointer to
      destroy_memcg_params().  It's called from a separate work context by
      slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
      This guarantees that all scheduled page release RCU works will complete
      before the memcg pointer will be zeroed.
      
      Big thanks for Karsten for the perfect report containing all necessary
      information, his help with the analysis of the problem and testing of the
      fix.
      
      Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
      Fixes: fb2f2b0a ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NKarsten Graul <kgraul@linux.ibm.com>
      Tested-by: NKarsten Graul <kgraul@linux.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Karsten Graul <kgraul@linux.ibm.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b749ecfa
    • A
      mm/memunmap: don't access uninitialized memmap in memunmap_pages() · 77e080e7
      Aneesh Kumar K.V 提交于
      Patch series "mm/memory_hotplug: Shrink zones before removing memory",
      v6.
      
      This series fixes the access of uninitialized memmaps when shrinking
      zones/nodes and when removing memory.  Also, it contains all fixes for
      crashes that can be triggered when removing certain namespace using
      memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
      
      We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
      more involved (we don't have SECTION_IS_ONLINE as an indicator), and
      shrinking is only of limited use (set_zone_contiguous() cannot detect
      the ZONE_DEVICE as contiguous).
      
      We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
      of code to a minimum.  Shrinking is especially necessary to keep
      zone->contiguous set where possible, especially, on memory unplug of
      DIMMs at zone boundaries.
      
      --------------------------------------------------------------------------
      
      Zones are now properly shrunk when offlining memory blocks or when
      onlining failed.  This allows to properly shrink zones on memory unplug
      even if the separate memory blocks of a DIMM were onlined to different
      zones or re-onlined to a different zone after offlining.
      
      Example:
      
        :/# cat /proc/zoneinfo
        Node 1, zone  Movable
                spanned  0
                present  0
                managed  0
        :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
        :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
        :/# cat /proc/zoneinfo
        Node 1, zone  Movable
                spanned  98304
                present  65536
                managed  65536
        :/# echo 0 > /sys/devices/system/memory/memory43/online
        :/# cat /proc/zoneinfo
        Node 1, zone  Movable
                spanned  32768
                present  32768
                managed  32768
        :/# echo 0 > /sys/devices/system/memory/memory41/online
        :/# cat /proc/zoneinfo
        Node 1, zone  Movable
                spanned  0
                present  0
                managed  0
      
      This patch (of 10):
      
      With an altmap, the memmap falling into the reserved altmap space are not
      initialized and, therefore, contain a garbage NID and a garbage zone.
      Make sure to read the NID/zone from a memmap that was initialized.
      
      This fixes a kernel crash that is observed when destroying a namespace:
      
        kernel BUG at include/linux/mm.h:1107!
        cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
            pc: c0000000004b9728: memunmap_pages+0x238/0x340
            lr: c0000000004b9724: memunmap_pages+0x234/0x340
        ...
            pid   = 3669, comm = ndctl
        kernel BUG at include/linux/mm.h:1107!
          devm_action_release+0x30/0x50
          release_nodes+0x268/0x2d0
          device_release_driver_internal+0x174/0x240
          unbind_store+0x13c/0x190
          drv_attr_store+0x44/0x60
          sysfs_kf_write+0x70/0xa0
          kernfs_fop_write+0x1ac/0x290
          __vfs_write+0x3c/0x70
          vfs_write+0xe4/0x200
          ksys_write+0x7c/0x140
          system_call+0x5c/0x68
      
      The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f ("mm,
      devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
      think we will never have driver reserved memory with
      MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).
      
      [david@redhat.com: minimze code changes, rephrase description]
      Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
      Fixes: 2c2a5af6 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Damian Tometzki <damian.tometzki@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Halil Pasic <pasic@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77e080e7
    • D
      mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span() · 00d6c019
      David Hildenbrand 提交于
      We might use the nid of memmaps that were never initialized.  For
      example, if the memmap was poisoned, we will crash the kernel in
      pfn_to_nid() right now.  Let's use the calculated boundaries of the
      separate zones instead.  This now also avoids having to iterate over a
      whole bunch of subsections again, after shrinking one zone.
      
      Before commit d0dc12e8 ("mm/memory_hotplug: optimize memory
      hotplug"), the memmap was initialized to 0 and the node was set to the
      right value.  After that commit, the node might be garbage.
      
      We'll have to fix shrink_zone_span() next.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-4-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[d0dc12e8]
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Damian Tometzki <damian.tometzki@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Halil Pasic <pasic@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00d6c019
    • Q
      mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo · a26ee565
      Qian Cai 提交于
      Uninitialized memmaps contain garbage and in the worst case trigger
      kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
      touched.
      
      For example, when not onlining a memory block that is spanned by a zone
      and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
      CONFIG_PAGE_POISONING, we can trigger a kernel BUG:
      
        :/# echo 1 > /sys/devices/system/memory/memory40/online
        :/# echo 1 > /sys/devices/system/memory/memory42/online
        :/# cat /proc/pagetypeinfo > test.file
         page:fffff2c585200000 is uninitialized and poisoned
         raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
         raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
         page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
         There is not page extension available.
         ------------[ cut here ]------------
         kernel BUG at include/linux/mm.h:1107!
         invalid opcode: 0000 [#1] SMP NOPTI
      
      Please note that this change does not affect ZONE_DEVICE, because
      pagetypeinfo_showmixedcount_print() is called from
      mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
      ZONE_DEVICE is never populated (zone->present_pages always 0).
      
      [david@redhat.com: move check to outer loop, add comment, rephrase description]
      Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e8Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Miles Chen <miles.chen@mediatek.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a26ee565
    • J
      scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set · ca210ba3
      Joel Colledge 提交于
      When CONFIG_PRINTK_CALLER is set, struct printk_log contains an
      additional member caller_id.  This affects the offset of the log text.
      Account for this by using the type information from gdb to determine all
      the offsets instead of using hardcoded values.
      
      This fixes following error:
      
        (gdb) lx-dmesg
        Python Exception <class 'ValueError'> embedded null character:
        Error occurred in Python command: embedded null character
      
      The read_u* utility functions now take an offset argument to make them
      easier to use.
      
      Link: http://lkml.kernel.org/r/20191011142500.2339-1-joel.colledge@linbit.comSigned-off-by: NJoel Colledge <joel.colledge@linbit.com>
      Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Cc: Leonard Crestez <leonard.crestez@nxp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca210ba3
    • D
      mm/memory-failure.c: don't access uninitialized memmaps in memory_failure() · 96c804a6
      David Hildenbrand 提交于
      We should check for pfn_to_online_page() to not access uninitialized
      memmaps.  Reshuffle the code so we don't have to duplicate the error
      message.
      
      Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96c804a6
    • D
      fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c · aad5f69b
      David Hildenbrand 提交于
      There are three places where we access uninitialized memmaps, namely:
      - /proc/kpagecount
      - /proc/kpageflags
      - /proc/kpagecgroup
      
      We have initialized memmaps either when the section is online or when the
      page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
      garbage and in the worst case trigger kernel BUGs, especially with
      CONFIG_PAGE_POISONING.
      
      For example, not onlining a DIMM during boot and calling /proc/kpagecount
      with CONFIG_PAGE_POISONING:
      
        :/# cat /proc/kpagecount > tmp.test
        BUG: unable to handle page fault for address: fffffffffffffffe
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
        Oops: 0000 [#1] SMP NOPTI
        CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
        RIP: 0010:kpagecount_read+0xce/0x1e0
        Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
        RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
        RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
        RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
        R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
        FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
        Call Trace:
         proc_reg_read+0x3c/0x60
         vfs_read+0xc5/0x180
         ksys_read+0x68/0xe0
         do_syscall_64+0x5c/0xa0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      For now, let's drop support for ZONE_DEVICE from the three pseudo files
      in order to fix this.  To distinguish offline memory (with garbage
      memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
      would have to check get_dev_pagemap() and pfn_zone_device_reserved()
      right now.  The usage of both (especially, special casing devmem) is
      frowned upon and needs to be reworked.
      
      The fundamental issue we have is:
      
      	if (pfn_to_online_page(pfn)) {
      		/* memmap initialized */
      	} else if (pfn_valid(pfn)) {
      		/*
      		 * ???
      		 * a) offline memory. memmap garbage.
      		 * b) devmem: memmap initialized to ZONE_DEVICE.
      		 * c) devmem: reserved for driver. memmap garbage.
      		 * (d) devmem: memmap currently initializing - garbage)
      		 */
      	}
      
      We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
      in place as that function is also used from memory failure.  We now no
      longer dump information about pages that are not in use anymore -
      offline.
      
      Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
      Cc: Pankaj gupta <pagupta@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aad5f69b
    • D
      drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store() · 641fe2e9
      David Hildenbrand 提交于
      Uninitialized memmaps contain garbage and in the worst case trigger kernel
      BUGs, especially with CONFIG_PAGE_POISONING.  They should not get touched.
      
      Right now, when trying to soft-offline a PFN that resides on a memory
      block that was never onlined, one gets a misleading error with
      CONFIG_PAGE_POISONING:
      
        :/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
        [   23.097167] soft offline: 0x150000 page already poisoned
      
      But the actual result depends on the garbage in the memmap.
      
      soft_offline_page() can only work with online pages, it returns -EIO in
      case of ZONE_DEVICE.  Make sure to only forward pages that are online
      (iow, managed by the buddy) and, therefore, have an initialized memmap.
      
      Add a check against pfn_to_online_page() and similarly return -EIO.
      
      Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      641fe2e9
    • L
      filldir[64]: remove WARN_ON_ONCE() for bad directory entries · b9959c7a
      Linus Torvalds 提交于
      This was always meant to be a temporary thing, just for testing and to
      see if it actually ever triggered.
      
      The only thing that reported it was syzbot doing disk image fuzzing, and
      then that warning is expected.  So let's just remove it before -rc4,
      because the extra sanity testing should probably go to -stable, but we
      don't want the warning to do so.
      
      Reported-by: syzbot+3031f712c7ad5dd4d926@syzkaller.appspotmail.com
      Fixes: 8a23eb80 ("Make filldir[64]() verify the directory entry filename is valid")
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9959c7a
    • L
      Merge tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client · 6b95cf9b
      Linus Torvalds 提交于
      Pull ceph fixes from Ilya Dryomov:
       "A future-proofing decoding fix from Jeff intended for stable and a
        patch for a mostly benign race from Dongsheng"
      
      * tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client:
        rbd: cancel lock_dwork if the wait is interrupted
        ceph: just skip unrecognized info in ceph_reply_info_extra
      6b95cf9b
    • L
      Merge tag 'for-5.4/dm-fixes' of... · fb8527e5
      Linus Torvalds 提交于
      Merge tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix DM snapshot deadlock that can occur due to COW throttling
         preventing locks from being released.
      
       - Fix DM cache's GFP_NOWAIT allocation failure error paths by switching
         to GFP_NOIO.
      
       - Make __hash_find() static in the DM clone target.
      
      * tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm cache: fix bugs when a GFP_NOWAIT allocation fails
        dm snapshot: rework COW throttling to fix deadlock
        dm snapshot: introduce account_start_copy() and account_end_copy()
        dm clone: Make __hash_find static
      fb8527e5
    • L
      Merge tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 90105ae1
      Linus Torvalds 提交于
      Pull iommu fixes from Joerg Roedel:
      
       - Fixes for page-table issues on Mali GPUs
      
       - Missing free in an error path for ARM-SMMU
      
       - PASID decoding in the AMD IOMMU Event log code
      
       - Another update for the locking fixes in the AMD IOMMU driver
      
       - Reduce the calls to platform_get_irq() in the IPMMU-VMSA and Rockchip
         IOMMUs to get rid of the warning message added to this function
         recently
      
      * tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Check PM_LEVEL_SIZE() condition in locked section
        iommu/amd: Fix incorrect PASID decoding from event log
        iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory
        iommu/rockchip: Don't use platform_get_irq to implicitly count irqs
        iommu/io-pgtable-arm: Support all Mali configurations
        iommu/io-pgtable-arm: Correct Mali attributes
        iommu/arm-smmu: Free context bitmap in the err path of arm_smmu_init_domain_context
      90105ae1
    • L
      Merge tag 'copy-struct-from-user-v5.4-rc4' of... · 8eb4b3b0
      Linus Torvalds 提交于
      Merge tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux
      
      Pull usercopy test fixlets from Christian Brauner:
       "This contains two improvements for the copy_struct_from_user() tests:
      
         - a coding style change to get rid of the ugly "if ((ret |= test()))"
           pointed out when pulling the original patchset.
      
         - avoid a soft lockups when running the usercopy tests on machines
           with large page sizes by scanning only a 1024 byte region"
      
      * tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
        usercopy: Avoid soft lockups in test_check_nonzero_user()
        lib: test_user_copy: style cleanup
      8eb4b3b0
    • L
      Merge tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 7571438a
      Linus Torvalds 提交于
      Pull MMC fixes from Ulf Hansson:
       "MMC host:
         - sdhci-iproc: Prevent some spurious interrupts
         - renesas_sdhi/sh_mmcif: Avoid false warnings about IRQs not found
      
        MEMSTICK host:
         - jmb38x_ms: Fix an error handling path at ->probe()"
      
      * tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
        mmc: sdhci-iproc: fix spurious interrupts on Multiblock reads with bcm2711
        mmc: sh_mmcif: Use platform_get_irq_optional() for optional interrupt
        mmc: renesas_sdhi: Do not use platform_get_irq() to count interrupts
      7571438a
    • L
      Merge tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5f93393a
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "Just a few small fixes for the usual suspect, HD- and USB-audio:
        enablement of runtime PM for Nvidia due to the recent PCI changes, a
        fix for potential hangs with recent HD-audio platforms, and the rest
        device-specific quirks"
      
      * tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Force runtime PM on Nvidia HDMI codecs
        ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
        ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
        ALSA: hdac: clear link output stream mapping
        ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360
      5f93393a
  2. 18 10月, 2019 17 次提交
    • L
      Merge tag 'acpi-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · adca4ce3
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "Fix possible use-after-free in the ACPI CPPC support code (John Garry)
        and prevent the ACPI HMAT parsing code from using possibly incorrect
        data coming from the platform firmware (Daniel Black)"
      
      * tag 'acpi-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
        ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3
      adca4ce3
    • L
      Merge tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e59b76ff
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These include a fix for a recent regression in the ACPI CPU
      performance scaling code, a PCI device power management fix,
      a system shutdown fix related to cpufreq, a removal of an ACPI
      suspend-to-idle blacklist entry and a build warning fix.
      
      Specifics:
      
         - Fix possible NULL pointer dereference in the ACPI processor scaling
           initialization code introduced by a recent cpufreq update (Rafael
           Wysocki).
      
         - Fix possible deadlock due to suspending cpufreq too late during
           system shutdown (Rafael Wysocki).
      
         - Make the PCI device system resume code path be more consistent with
           its PM-runtime counterpart to fix an issue with missing delay on
           transitions from D3cold to D0 during system resume from
           suspend-to-idle on some systems (Rafael Wysocki).
      
         - Drop Dell XPS13 9360 from the LPS0 Idle _DSM blacklist to make it
           use suspend-to-idle by default (Mario Limonciello).
      
         - Fix build warning in the core system suspend support code (Ben
           Dooks)"
      
      * tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: processor: Avoid NULL pointer dereferences at init time
        PCI: PM: Fix pci_power_up()
        PM: sleep: include <linux/pm_runtime.h> for pm_wq
        cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
        ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
      e59b76ff
    • L
      Merge tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi · c3419fd6
      Linus Torvalds 提交于
      Pull scsi fixes from Martin Petersen:
       "These two commits were in a separate postmerge branch due to a
        dependency on changes merged for 5.4 in the block tree.
      
        They fix two issues in the intersection of the request cleanup changes
        from block (b7e9e1fb) and the request batching changes
        (8930a6c2) that were made to SCSI during the 5.4 cycle"
      
      * tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
        scsi: core: fix dh and multipathing for SCSI hosts without request batching
        scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching
      c3419fd6
    • J
      iommu/amd: Check PM_LEVEL_SIZE() condition in locked section · 46ac18c3
      Joerg Roedel 提交于
      The increase_address_space() function has to check the PM_LEVEL_SIZE()
      condition again under the domain->lock to avoid a false trigger of the
      WARN_ON_ONCE() and to avoid that the address space is increase more
      often than necessary.
      Reported-by: NQian Cai <cai@lca.pw>
      Fixes: 754265bc ("iommu/amd: Fix race in increase_address_space()")
      Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      46ac18c3
    • R
      Merge branch 'acpi-tables' · ffba17bb
      Rafael J. Wysocki 提交于
      * acpi-tables:
        ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3
      ffba17bb
    • J
      ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit() · 56a0b978
      John Garry 提交于
      When enabling KASAN and DEBUG_TEST_DRIVER_REMOVE, I find this KASAN
      warning:
      
      [   20.872057] BUG: KASAN: use-after-free in pcc_data_alloc+0x40/0xb8
      [   20.878226] Read of size 4 at addr ffff00236cdeb684 by task swapper/0/1
      [   20.884826]
      [   20.886309] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00009-ge7f7df3db5bf-dirty #289
      [   20.894994] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
      [   20.903505] Call trace:
      [   20.905942]  dump_backtrace+0x0/0x200
      [   20.909593]  show_stack+0x14/0x20
      [   20.912899]  dump_stack+0xd4/0x130
      [   20.916291]  print_address_description.isra.9+0x6c/0x3b8
      [   20.921592]  __kasan_report+0x12c/0x23c
      [   20.925417]  kasan_report+0xc/0x18
      [   20.928808]  __asan_load4+0x94/0xb8
      [   20.932286]  pcc_data_alloc+0x40/0xb8
      [   20.935938]  acpi_cppc_processor_probe+0x4e8/0xb08
      [   20.940717]  __acpi_processor_start+0x48/0xb0
      [   20.945062]  acpi_processor_start+0x40/0x60
      [   20.949235]  really_probe+0x118/0x548
      [   20.952887]  driver_probe_device+0x7c/0x148
      [   20.957059]  device_driver_attach+0x94/0xa0
      [   20.961231]  __driver_attach+0xa4/0x110
      [   20.965055]  bus_for_each_dev+0xe8/0x158
      [   20.968966]  driver_attach+0x30/0x40
      [   20.972531]  bus_add_driver+0x234/0x2f0
      [   20.976356]  driver_register+0xbc/0x1d0
      [   20.980182]  acpi_processor_driver_init+0x40/0xe4
      [   20.984875]  do_one_initcall+0xb4/0x254
      [   20.988700]  kernel_init_freeable+0x24c/0x2f8
      [   20.993047]  kernel_init+0x10/0x118
      [   20.996524]  ret_from_fork+0x10/0x18
      [   21.000087]
      [   21.001567] Allocated by task 1:
      [   21.004785]  save_stack+0x28/0xc8
      [   21.008089]  __kasan_kmalloc.isra.9+0xbc/0xd8
      [   21.012435]  kasan_kmalloc+0xc/0x18
      [   21.015913]  pcc_data_alloc+0x94/0xb8
      [   21.019564]  acpi_cppc_processor_probe+0x4e8/0xb08
      [   21.024343]  __acpi_processor_start+0x48/0xb0
      [   21.028689]  acpi_processor_start+0x40/0x60
      [   21.032860]  really_probe+0x118/0x548
      [   21.036512]  driver_probe_device+0x7c/0x148
      [   21.040684]  device_driver_attach+0x94/0xa0
      [   21.044855]  __driver_attach+0xa4/0x110
      [   21.048680]  bus_for_each_dev+0xe8/0x158
      [   21.052591]  driver_attach+0x30/0x40
      [   21.056155]  bus_add_driver+0x234/0x2f0
      [   21.059980]  driver_register+0xbc/0x1d0
      [   21.063805]  acpi_processor_driver_init+0x40/0xe4
      [   21.068497]  do_one_initcall+0xb4/0x254
      [   21.072322]  kernel_init_freeable+0x24c/0x2f8
      [   21.076667]  kernel_init+0x10/0x118
      [   21.080144]  ret_from_fork+0x10/0x18
      [   21.083707]
      [   21.085186] Freed by task 1:
      [   21.088056]  save_stack+0x28/0xc8
      [   21.091360]  __kasan_slab_free+0x118/0x180
      [   21.095445]  kasan_slab_free+0x10/0x18
      [   21.099183]  kfree+0x80/0x268
      [   21.102139]  acpi_cppc_processor_exit+0x1a8/0x1b8
      [   21.106832]  acpi_processor_stop+0x70/0x80
      [   21.110917]  really_probe+0x174/0x548
      [   21.114568]  driver_probe_device+0x7c/0x148
      [   21.118740]  device_driver_attach+0x94/0xa0
      [   21.122912]  __driver_attach+0xa4/0x110
      [   21.126736]  bus_for_each_dev+0xe8/0x158
      [   21.130648]  driver_attach+0x30/0x40
      [   21.134212]  bus_add_driver+0x234/0x2f0
      [   21.0x10/0x18
      [   21.161764]
      [   21.163244] The buggy address belongs to the object at ffff00236cdeb600
      [   21.163244]  which belongs to the cache kmalloc-256 of size 256
      [   21.175750] The buggy address is located 132 bytes inside of
      [   21.175750]  256-byte region [ffff00236cdeb600, ffff00236cdeb700)
      [   21.187473] The buggy address belongs to the page:
      [   21.192254] page:fffffe008d937a00 refcount:1 mapcount:0 mapping:ffff002370c0fa00 index:0x0 compound_mapcount: 0
      [   21.202331] flags: 0x1ffff00000010200(slab|head)
      [   21.206940] raw: 1ffff00000010200 dead000000000100 dead000000000122 ffff002370c0fa00
      [   21.214671] raw: 0000000000000000 00000000802a002a 00000001ffffffff 0000000000000000
      [   21.222400] page dumped because: kasan: bad access detected
      [   21.227959]
      [   21.229438] Memory state around the buggy address:
      [   21.234218]  ffff00236cdeb580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   21.241427]  ffff00236cdeb600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   21.248637] >ffff00236cdeb680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   21.255845]                    ^
      [   21.259062]  ffff00236cdeb700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   21.266272]  ffff00236cdeb780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   21.273480] ==================================================================
      
      It seems that global pcc_data[pcc_ss_id] can be freed in
      acpi_cppc_processor_exit(), but we may later reference this value, so
      NULLify it when freed.
      
      Also remove the useless setting of data "pcc_channel_acquired", which
      we're about to free.
      
      Fixes: 85b1407b ("ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      56a0b978
    • R
      Merge branches 'pm-cpufreq' and 'pm-sleep' · b23eb5c7
      Rafael J. Wysocki 提交于
      * pm-cpufreq:
        ACPI: processor: Avoid NULL pointer dereferences at init time
        cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
      
      * pm-sleep:
        PM: sleep: include <linux/pm_runtime.h> for pm_wq
        ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
      b23eb5c7
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 0e2adab6
      Linus Torvalds 提交于
      Pull arm64 fixes from Will Deacon:
       "The main thing here is a long-awaited workaround for a CPU erratum on
        ThunderX2 which we have developed in conjunction with engineers from
        Cavium/Marvell.
      
        At the moment, the workaround is unconditionally enabled for affected
        CPUs at runtime but we may add a command-line option to disable it in
        future if performance numbers show up indicating a significant cost
        for real workloads.
      
        Summary:
      
         - Work around Cavium/Marvell ThunderX2 erratum #219
      
         - Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses
      
         - More fixes to the spurious kernel fault detection logic
      
         - Fix pathological preemption race when enabling some CPU features at boot
      
         - Drop broken kcore macros in favour of generic implementations
      
         - Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled
      
         - Avoid NULL dereference on allocation failure during hibernation"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: tags: Preserve tags for addresses translated via TTBR1
        arm64: mm: fix inverted PAR_EL1.F check
        arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
        arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
        arm64: hibernate: check pgd table allocation
        arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
        arm64: Fix kcore macros after 52-bit virtual addressing fallout
        arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
        arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
        arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
        arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
      0e2adab6
    • L
      Merge tag 'xtensa-20191017' of git://github.com/jcmvbkbc/linux-xtensa · ad32fd74
      Linus Torvalds 提交于
      Pull Xtensa fixes from Max Filippov:
      
       - fix {get,put}_user() for 64bit values
      
       - fix warning about static EXPORT_SYMBOL from modpost
      
       - fix PCI IO ports mapping for the virt board
      
       - fix pasto in change_bit for exclusive access option
      
      * tag 'xtensa-20191017' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix change_bit in exclusive access option
        xtensa: virt: fix PCI IO ports mapping
        xtensa: drop EXPORT_SYMBOL for outs*/ins*
        xtensa: fix type conversion in __get_user_[no]check
        xtensa: clean up assembly arguments in uaccess macros
        xtensa: fix {get,put}_user() for 64bit values
      ad32fd74
    • L
      Merge tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 6e8ba009
      Linus Torvalds 提交于
      Pull xfs fix from Darrick Wong:
       "The single fix converts the seconds field in the recently added XFS
        bulkstat structure to a signed 64-bit quantity.
      
        The structure layout doesn't change and so far there are no users of
        the ioctl to break because we only publish xfs ioctl interfaces
        through the XFS userspace development libraries, and we're still
        working on a 5.3 release"
      
      * tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: change the seconds fields in xfs_bulkstat to signed
      6e8ba009
    • L
      Merge tag 'drm-fixes-2019-10-18' of git://anongit.freedesktop.org/drm/drm · 839e0f04
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "This is this weeks fixes for drm.
      
        The dma-resv one is probably the more important one a fair few people
        have reported it, besides that it's a couple of panfrost, a few i915
        and a few amdgpu fixes.
      
        One radeon patch to fix some ppc64 related issues caused an x86
        regression so is getting reverted for now.
      
        Summary:
      
        dma-resv:
         - shared fences for lima/panfrost
      
        ttm:
         - prefault regression fix
         - lifetime fix
      
        panfrost:
         - stopped job timeout fix
         - missing register values
      
        amdgpu:
         - smu7 powerplay fix
         - bail earlier for cik/si detection
         - navi SDMA fix
      
        radeon:
         - revert a ppc64 shutdown fix that broke x86
      
        i915:
         - VBT information handling fix
         - Circular locking fix
         - preemption vs resubmission virtual requests fix"
      
      * tag 'drm-fixes-2019-10-18' of git://anongit.freedesktop.org/drm/drm:
        drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request
        drm/i915/userptr: Never allow userptr into the mappable GGTT
        drm/i915: Favor last VBT child device with conflicting AUX ch/DDC pin
        drm/i915/execlists: Refactor -EIO markup of hung requests
        drm/panfrost: Handle resetting on timeout better
        drm/panfrost: Add missing GPU feature registers
        drm/ttm: fix handling in ttm_bo_add_mem_to_lru
        drm/ttm: Restore ttm prefaulting
        drm/ttm: fix busy reference in ttm_mem_evict_first
        drm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe sync
        drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1
        Revert "drm/radeon: Fix EEH during kexec"
        drm/msm/dsi: Implement reset correctly
        dma-buf/resv: fix exclusive fence get
        drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50
        drm/tiny: Kconfig: Remove always-y THERMAL dep. from TINYDRM_REPAPER
        drm/amdgpu/powerplay: fix typo in mvdd table setup
      839e0f04
    • W
      Merge branch 'errata/tx2-219' into for-next/fixes · 777d062e
      Will Deacon 提交于
      Workaround for Cavium/Marvell ThunderX2 erratum #219.
      
      * errata/tx2-219:
        arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
        arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
        arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
        arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
      777d062e
    • D
      Merge tag 'drm-misc-fixes-2019-10-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 5c1e34b5
      Dave Airlie 提交于
      -dma-resv: Change shared_count to post-increment to fix lima crash (Qiang)
      -ttm: A couple fixes related to lifetime and restore prefault behavior
       (Christian & Thomas)
      -panfrost: Fill in missing feature reg values and fix stoppedjob timeouts
       (Steven)
      
      Cc: Qiang Yu <yuq825@gmail.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Steven Price <steven.price@arm.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Sean Paul <sean@poorly.run>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191017203419.GA142909@art_vandelay
      5c1e34b5
    • D
      Merge tag 'drm-fixes-5.4-2019-10-16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 7557d278
      Dave Airlie 提交于
      drm-fixes-5.4-2019-10-16:
      
      amdgpu:
      - Powerplay fix for SMU7 parts
      - Bail earlier when cik/si support is not set to 1
      - Fix an SDMA issue on navi
      
      radeon:
      - revert a PPC fix which broken x86
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191017022443.3853-1-alexander.deucher@amd.com
      7557d278
    • D
      Merge tag 'drm-intel-fixes-2019-10-17' of... · 33ba90ee
      Dave Airlie 提交于
      Merge tag 'drm-intel-fixes-2019-10-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      - Display fix on handling VBT information.
      - Important circular locking fix
      - Fix for preemption vs resubmission on virtual requests
        - and a prep patch to make this last one to apply cleanly
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191017135444.GA12255@intel.com
      33ba90ee
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 84629d43
      Linus Torvalds 提交于
      Pull input fixes from Dmitry Torokhov:
       "The main change is that we are reverting blanket enablement of SMBus
        mode for devices with Elan touchpads that report BIOS release date as
        2018+ because there are older boxes with updated BIOSes that still do
        not work well in SMbus mode.
      
        We will have to establish whitelist for SMBus mode it looks like"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Revert "Input: elantech - enable SMBus on new (2018+) systems"
        Input: synaptics-rmi4 - avoid processing unknown IRQs
        Input: soc_button_array - partial revert of support for newer surface devices
        Input: goodix - add support for 9-bytes reports
        Input: da9063 - fix capability and drop KEY_SLEEP
      84629d43
    • A
      coccinelle: api/devm_platform_ioremap_resource: remove useless script · 283ea345
      Alexandre Belloni 提交于
      While it is useful for new drivers to use devm_platform_ioremap_resource,
      this script is currently used to spam maintainers, often updating very
      old drivers.  The net benefit is the removal of 2 lines of code in the
      driver but the review load for the maintainers is huge.  As of now, more
      that 560 patches have been sent, some of them obviously broken, as in:
      
       https://lore.kernel.org/lkml/9bbcce19c777583815c92ce3c2ff2586@www.loen.fr/
      
      Remove the script to reduce the spam.
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Acked-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      283ea345
  3. 17 10月, 2019 4 次提交
    • L
      ALSA: hda - Force runtime PM on Nvidia HDMI codecs · 94989e31
      Lukas Wunner 提交于
      Przemysław Kopa reports that since commit b516ea58 ("PCI: Enable
      NVIDIA HDA controllers"), the discrete GPU Nvidia GeForce GT 540M on his
      2011 Samsung laptop refuses to runtime suspend, resulting in a power
      regression and excessive heat.
      
      Rivera Valdez witnesses the same issue with a GeForce GT 525M (GF108M)
      of the same era, as does another Arch Linux user named "R0AR" with a
      more recent GeForce GTX 1050 Ti (GP107M).
      
      The commit exposes the discrete GPU's HDA controller and all four codecs
      on the controller do not set the CLKSTOP and EPSS bits in the Supported
      Power States Response.  They also do not set the PS-ClkStopOk bit in the
      Get Power State Response.  hda_codec_runtime_suspend() therefore does
      not call snd_hdac_codec_link_down(), which prevents each codec and the
      PCI device from runtime suspending.
      
      The same issue is present on some AMD discrete GPUs and we addressed it
      by forcing runtime PM despite the bits not being set, see commit
      57cb54e5 ("ALSA: hda - Force to link down at runtime suspend on
      ATI/AMD HDMI").
      
      Do the same for Nvidia HDMI codecs.
      
      Fixes: b516ea58 ("PCI: Enable NVIDIA HDA controllers")
      Link: https://bbs.archlinux.org/viewtopic.php?pid=1865512
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=75985#c81Reported-by: NPrzemysław Kopa <prymoo@gmail.com>
      Reported-by: NRivera Valdez <riveravaldez@ysinembargo.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: Daniel Drake <dan@reactivated.net>
      Cc: stable@vger.kernel.org # v5.3+
      Link: https://lore.kernel.org/r/3086bc75135c1e3567c5bc4f3cc4ff5cbf7a56c2.1571324194.git.lukas@wunner.deSigned-off-by: NTakashi Iwai <tiwai@suse.de>
      94989e31
    • L
      Merge tag 'platform-drivers-x86-v5.4-3' of git://git.infradead.org/linux-platform-drivers-x86 · fe7d2c23
      Linus Torvalds 提交于
      Pull x86 platform driver fixes from Andy Shevchenko:
      
       - Users of Intel P-Unit IPC driver might be surprised by harmless
         warning. Thus, switch to API which doesn't issue a warning at all.
      
       - I²C multi-instantiate driver continues to add slave devices even when
         IRQ resource is not found. For devices in the market IRQ resource is
         mandatory, so, fail the ->probe() of the parent driver to avoid
         slaves being probed.
      
       - Avoid compiler warning due to unused variable in Classmate laptop
         driver.
      
      * tag 'platform-drivers-x86-v5.4-3' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/x86: i2c-multi-instantiate: Fail the probe if no IRQ provided
        platform/x86: intel_punit_ipc: Avoid error message when retrieving IRQ
        platform/x86: classmate-laptop: remove unused variable
      fe7d2c23
    • M
      dm cache: fix bugs when a GFP_NOWAIT allocation fails · 13bd677a
      Mikulas Patocka 提交于
      GFP_NOWAIT allocation can fail anytime - it doesn't wait for memory being
      available and it fails if the mempool is exhausted and there is not enough
      memory.
      
      If we go down this path:
        map_bio -> mg_start -> alloc_migration -> mempool_alloc(GFP_NOWAIT)
      we can see that map_bio() doesn't check the return value of mg_start(),
      and the bio is leaked.
      
      If we go down this path:
        map_bio -> mg_start -> mg_lock_writes -> alloc_prison_cell ->
        dm_bio_prison_alloc_cell_v2 -> mempool_alloc(GFP_NOWAIT) ->
        mg_lock_writes -> mg_complete
      the bio is ended with an error - it is unacceptable because it could
      cause filesystem corruption if the machine ran out of memory
      temporarily.
      
      Change GFP_NOWAIT to GFP_NOIO, so that the mempool code will properly
      wait until memory becomes available. mempool_alloc with GFP_NOIO can't
      fail, so remove the code paths that deal with allocation failure.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      13bd677a
    • L
      Merge tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 7801158f
      Linus Torvalds 提交于
      Pull GPIO fixes from Linus Walleij:
       "The fixes pertain to a problem with initializing the Intel GPIO
        irqchips when adding gpiochips.
      
        Andy fixed it up elegantly by adding a hardware initialization
        callback to the struct gpio_irq_chip so let's use this. Tested and
        verified on the target hardware"
      
      * tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: lynxpoint: set default handler to be handle_bad_irq()
        gpio: merrifield: Move hardware initialization to callback
        gpio: lynxpoint: Move hardware initialization to callback
        gpio: intel-mid: Move hardware initialization to callback
        gpiolib: Initialize the hardware with a callback
        gpio: merrifield: Restore use of irq_base
      7801158f