1. 10 6月, 2020 1 次提交
    • M
      mm: introduce include/linux/pgtable.h · ca5999fd
      Mike Rapoport 提交于
      The include/linux/pgtable.h is going to be the home of generic page table
      manipulation functions.
      
      Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
      make the latter include asm/pgtable.h.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca5999fd
  2. 09 6月, 2020 2 次提交
  3. 05 6月, 2020 4 次提交
  4. 04 6月, 2020 12 次提交
    • J
      mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API · 9d82c694
      Johannes Weiner 提交于
      With the page->mapping requirement gone from memcg, we can charge anon and
      file-thp pages in one single step, right after they're allocated.
      
      This removes two out of three API calls - especially the tricky commit
      step that needed to happen at just the right time between when the page is
      "set up" and when it's "published" - somewhat vague and fluid concepts
      that varied by page type.  All we need is a freshly allocated page and a
      memcg context to charge.
      
      v2: prevent double charges on pre-allocated hugepages in khugepaged
      
      [hannes@cmpxchg.org: Fix crash - *hpage could be ERR_PTR instead of NULL]
        Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d82c694
    • M
      mm: simplify calling a compound page destructor · ff45fc3c
      Matthew Wilcox (Oracle) 提交于
      None of the three callers of get_compound_page_dtor() want to know the
      value; they just want to call the function.  Replace it with
      destroy_compound_page() which calls the dtor for them.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200517105051.9352-1-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff45fc3c
    • A
      mm/page_alloc: restrict and formalize compound_page_dtors[] · ae70eddd
      Anshuman Khandual 提交于
      Restrict elements in compound_page_dtors[] array per NR_COMPOUND_DTORS and
      explicitly position them according to enum compound_dtor_id.  This
      improves protection against possible misalignment between
      compound_page_dtors[] and enum compound_dtor_id later on.
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Link: http://lkml.kernel.org/r/1589795958-19317-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae70eddd
    • B
      mm/page_alloc.c: remove unused free_bootmem_with_active_regions · 4ca7be24
      Baoquan He 提交于
      Since commit 397dc00e ("mips: sgi-ip27: switch from DISCONTIGMEM
      to SPARSEMEM"), the last caller of free_bootmem_with_active_regions() was
      gone.  Now no user calls it any more.
      
      Let's remove it.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Link: http://lkml.kernel.org/r/20200402143455.5145-1-bhe@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ca7be24
    • M
      mm: rename free_area_init_node() to free_area_init_memoryless_node() · bc9331a1
      Mike Rapoport 提交于
      free_area_init_node() is only used by x86 to initialize a memory-less
      nodes.  Make its name reflect this and drop all the function parameters
      except node ID as they are anyway zero.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-19-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc9331a1
    • M
      mm: free_area_init: allow defining max_zone_pfn in descending order · 51930df5
      Mike Rapoport 提交于
      Some architectures (e.g.  ARC) have the ZONE_HIGHMEM zone below the
      ZONE_NORMAL.  Allowing free_area_init() parse max_zone_pfn array even it
      is sorted in descending order allows using free_area_init() on such
      architectures.
      
      Add top -> down traversal of max_zone_pfn array in free_area_init() and
      use the latter in ARC node/zone initialization.
      
      [rppt@kernel.org: ARC fix]
        Link: http://lkml.kernel.org/r/20200504153901.GM14260@kernel.org
      [rppt@linux.ibm.com: arc: free_area_init(): take into account PAE40 mode]
        Link: http://lkml.kernel.org/r/20200507205900.GH683243@linux.ibm.com
      [akpm@linux-foundation.org: declare arch_has_descending_max_zone_pfns()]
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Link: http://lkml.kernel.org/r/20200412194859.12663-18-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51930df5
    • M
      mm: use free_area_init() instead of free_area_init_nodes() · 9691a071
      Mike Rapoport 提交于
      free_area_init() has effectively became a wrapper for
      free_area_init_nodes() and there is no point of keeping it.  Still
      free_area_init() name is shorter and more general as it does not imply
      necessity to initialize multiple nodes.
      
      Rename free_area_init_nodes() to free_area_init(), update the callers and
      drop old version of free_area_init().
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9691a071
    • M
      mm: free_area_init: use maximal zone PFNs rather than zone sizes · fa3354e4
      Mike Rapoport 提交于
      Currently, architectures that use free_area_init() to initialize memory
      map and node and zone structures need to calculate zone and hole sizes.
      We can use free_area_init_nodes() instead and let it detect the zone
      boundaries while the architectures will only have to supply the possible
      limits for the zones.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-5-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa3354e4
    • M
      mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option · 3f08a302
      Mike Rapoport 提交于
      CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
      nodes and zones structures between the systems that have region to node
      mapping in memblock and those that don't.
      
      Currently all the NUMA architectures enable this option and for the
      non-NUMA systems we can presume that all the memory belongs to node 0 and
      therefore the compile time configuration option is not required.
      
      The remaining few architectures that use DISCONTIGMEM without NUMA are
      easily updated to use memblock_add_node() instead of memblock_add() and
      thus have proper correspondence of memblock regions to NUMA nodes.
      
      Still, free_area_init_node() must have a backward compatible version
      because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
      different.  Once all the architectures will use the new semantics, the
      entire compatibility layer can be dropped.
      
      To avoid addition of extra run time memory to store node id for
      architectures that keep memblock but have only a single node, the node id
      field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
      the corresponding accessors presume that in those cases it is always 0.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f08a302
    • M
      mm: make early_pfn_to_nid() and related defintions close to each other · 6f24fbd3
      Mike Rapoport 提交于
      early_pfn_to_nid() and its helper __early_pfn_to_nid() are spread around
      include/linux/mm.h, include/linux/mmzone.h and mm/page_alloc.c.
      
      Drop unused stub for __early_pfn_to_nid() and move its actual generic
      implementation close to its users.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-3-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f24fbd3
    • J
      mm/gup: introduce pin_user_pages_fast_only() · 104acc32
      John Hubbard 提交于
      This is the FOLL_PIN equivalent of __get_user_pages_fast(), except with a
      more descriptive name, and gup_flags instead of a boolean "write" in the
      argument list.
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-4-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      104acc32
    • J
      mm/gup: refactor and de-duplicate gup_fast() code · 376a34ef
      John Hubbard 提交于
      There were two nearly identical sets of code for gup_fast() style of
      walking the page tables with interrupts disabled.  This has lead to the
      usual maintenance problems that arise from having duplicated code.
      
      There is already a core internal routine in gup.c for gup_fast(), so just
      enhance it very slightly: allow skipping the fall-back to "slow" (regular)
      get_user_pages(), via the new FOLL_FAST_ONLY flag.  Then, just call
      internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust
      the API to match pre-existing API behavior.
      
      There is a change in behavior from this refactoring: the nested form of
      interrupt disabling is used in all gup_fast() variants now.  That's
      because there is only one place that interrupt disabling for page walking
      is done, and so the safer form is required.  This should, if anything,
      eliminate possible (rare) bugs, because the non-nested form of enabling
      interrupts was fragile at best.
      
      [jhubbard@nvidia.com: fixup]
        Link: http://lkml.kernel.org/r/20200521233841.1279742-1-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-3-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      376a34ef
  5. 03 6月, 2020 3 次提交
  6. 29 5月, 2020 1 次提交
  7. 20 5月, 2020 1 次提交
  8. 27 4月, 2020 2 次提交
  9. 23 4月, 2020 1 次提交
  10. 21 4月, 2020 1 次提交
  11. 11 4月, 2020 4 次提交
  12. 08 4月, 2020 3 次提交
    • P
      userfaultfd: wp: apply _PAGE_UFFD_WP bit · 292924b2
      Peter Xu 提交于
      Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for
      change_protection() when used with uffd-wp and make sure the two new flags
      are exclusively used.  Then,
      
        - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW
          when a range of memory is write protected by uffd
      
        - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover
          _PAGE_RW when write protection is resolved from userspace
      
      And use this new interface in mwriteprotect_range() to replace the old
      MM_CP_DIRTY_ACCT.
      
      Do this change for both PTEs and huge PMDs.  Then we can start to identify
      which PTE/PMD is write protected by general (e.g., COW or soft dirty
      tracking), and which is for userfaultfd-wp.
      
      Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it
      into _PAGE_CHG_MASK as well.  Meanwhile, since we have this new bit, we
      can be even more strict when detecting uffd-wp page faults in either
      do_wp_page() or wp_huge_pmd().
      
      After we're with _PAGE_UFFD_WP, a special case is when a page is both
      protected by the general COW logic and also userfault-wp.  Here the
      userfault-wp will have higher priority and will be handled first.  Only
      after the uffd-wp bit is cleared on the PTE/PMD will we continue to handle
      the general COW.  These are the steps on what will happen with such a
      page:
      
        1. CPU accesses write protected shared page (so both protected by
           general COW and uffd-wp), blocked by uffd-wp first because in
           do_wp_page we'll handle uffd-wp first, so it has higher priority
           than general COW.
      
        2. Uffd service thread receives the request, do UFFDIO_WRITEPROTECT
           to remove the uffd-wp bit upon the PTE/PMD.  However here we
           still keep the write bit cleared.  Notify the blocked CPU.
      
        3. The blocked CPU resumes the page fault process with a fault
           retry, during retry it'll notice it was not with the uffd-wp bit
           this time but it is still write protected by general COW, then
           it'll go though the COW path in the fault handler, copy the page,
           apply write bit where necessary, and retry again.
      
        4. The CPU will be able to access this page with write bit set.
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Link: http://lkml.kernel.org/r/20200220163112.11409-8-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      292924b2
    • P
      mm: merge parameters for change_protection() · 58705444
      Peter Xu 提交于
      change_protection() was used by either the NUMA or mprotect() code,
      there's one parameter for each of the callers (dirty_accountable and
      prot_numa).  Further, these parameters are passed along the calls:
      
        - change_protection_range()
        - change_p4d_range()
        - change_pud_range()
        - change_pmd_range()
        - ...
      
      Now we introduce a flag for change_protect() and all these helpers to
      replace these parameters.  Then we can avoid passing multiple parameters
      multiple times along the way.
      
      More importantly, it'll greatly simplify the work if we want to introduce
      any new parameters to change_protection().  In the follow up patches, a
      new parameter for userfaultfd write protection will be introduced.
      
      No functional change at all.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Link: http://lkml.kernel.org/r/20200220163112.11409-7-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58705444
    • A
      mm/vma: make vma_is_accessible() available for general use · 3122e80e
      Anshuman Khandual 提交于
      Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
      available for general use.  While here, this replaces all remaining open
      encodings for VMA access check with vma_is_accessible().
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NGuo Ren <guoren@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3122e80e
  13. 03 4月, 2020 5 次提交
    • J
      mmap: remove inline of vm_unmapped_area · baceaf1c
      Jaewon Kim 提交于
      Patch series "mm: mmap: add mmap trace point", v3.
      
      Create mmap trace file and add trace point of vm_unmapped_area().
      
      This patch (of 2):
      
      In preparation for next patch remove inline of vm_unmapped_area and move
      code to mmap.c.  There is no logical change.
      
      Also remove unmapped_area[_topdown] out of mm.h, there is no code
      calling to them.
      Signed-off-by: NJaewon Kim <jaewon31.kim@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20200320055823.27089-2-jaewon31.kim@samsung.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      baceaf1c
    • P
      mm: allow VM_FAULT_RETRY for multiple times · 4064b982
      Peter Xu 提交于
      The idea comes from a discussion between Linus and Andrea [1].
      
      Before this patch we only allow a page fault to retry once.  We achieved
      this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
      handle_mm_fault() the second time.  This was majorly used to avoid
      unexpected starvation of the system by looping over forever to handle the
      page fault on a single page.  However that should hardly happen, and after
      all for each code path to return a VM_FAULT_RETRY we'll first wait for a
      condition (during which time we should possibly yield the cpu) to happen
      before VM_FAULT_RETRY is really returned.
      
      This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
      flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
      now can retry the page fault for multiple times if necessary without the
      need to generate another page fault event.  Meanwhile we still keep the
      FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
      page fault is the first attempt or not.
      
      Then we'll have these combinations of fault flags (only considering
      ALLOW_RETRY flag and TRIED flag):
      
        - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                                   retry, and this is the first try
      
        - ALLOW_RETRY and TRIED:   this means the page fault allows to
                                   retry, and this is not the first try
      
        - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                                   to retry at all
      
        - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used
      
      In existing code we have multiple places that has taken special care of
      the first condition above by checking against (fault_flags &
      FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
      the first retry of a page fault by checking against both (fault_flags &
      FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
      even the 2nd try will have the ALLOW_RETRY set, then use that helper in
      all existing special paths.  One example is in __lock_page_or_retry(), now
      we'll drop the mmap_sem only in the first attempt of page fault and we'll
      keep it in follow up retries, so old locking behavior will be retained.
      
      This will be a nice enhancement for current code [2] at the same time a
      supporting material for the future userfaultfd-writeprotect work, since in
      that work there will always be an explicit userfault writeprotect retry
      for protected pages, and if that cannot resolve the page fault (e.g., when
      userfaultfd-writeprotect is used in conjunction with swapped pages) then
      we'll possibly need a 3rd retry of the page fault.  It might also benefit
      other potential users who will have similar requirement like userfault
      write-protection.
      
      GUP code is not touched yet and will be covered in follow up patch.
      
      Please read the thread below for more information.
      
      [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
      [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4064b982
    • P
      mm: introduce FAULT_FLAG_INTERRUPTIBLE · c270a7ee
      Peter Xu 提交于
      handle_userfaultfd() is currently the only one place in the kernel page
      fault procedures that can respond to non-fatal userspace signals.  It was
      trying to detect such an allowance by checking against USER & KILLABLE
      flags, which was "un-official".
      
      In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
      that the fault handler allows the fault procedure to respond even to
      non-fatal signals.  Meanwhile, add this new flag to the default fault
      flags so that all the page fault handlers can benefit from the new flag.
      With that, replacing the userfault check to this one.
      
      Since the line is getting even longer, clean up the fault flags a bit too
      to ease TTY users.
      
      Although we've got a new flag and applied it, we shouldn't have any
      functional change with this patch so far.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c270a7ee
    • P
      mm: introduce FAULT_FLAG_DEFAULT · dde16072
      Peter Xu 提交于
      Although there're tons of arch-specific page fault handlers, most of them
      are still sharing the same initial value of the page fault flags.  Say,
      merely all of the page fault handlers would allow the fault to be retried,
      and they also allow the fault to respond to SIGKILL.
      
      Let's define a default value for the fault flags to replace those initial
      page fault flags that were copied over.  With this, it'll be far easier to
      introduce new fault flag that can be used by all the architectures instead
      of touching all the archs.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dde16072
    • A
      mm/vma: make is_vma_temporary_stack() available for general use · 222100ee
      Anshuman Khandual 提交于
      Currently the declaration and definition for is_vma_temporary_stack() are
      scattered.  Lets make is_vma_temporary_stack() helper available for
      general use and also drop the declaration from (include/linux/huge_mm.h)
      which is no longer required.  While at this, rename this as
      vma_is_temporary_stack() in line with existing helpers.  This should not
      cause any functional change.
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1582782965-3274-4-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      222100ee