1. 13 7月, 2019 40 次提交
    • U
      mm/vmalloc.c: get rid of one single unlink_va() when merge · 54f63d9d
      Uladzislau Rezki (Sony) 提交于
      It does not make sense to try to "unlink" the node that is definitely not
      linked with a list nor tree.  On the first merge step VA just points to
      the previously disconnected busy area.
      
      On the second step, check if the node has been merged and do "unlink" if
      so, because now it points to an object that must be linked.
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-4-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Acked-by: NHillf Danton <hdanton@sina.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54f63d9d
    • U
      mm/vmalloc.c: preload a CPU with one object for split purpose · 82dd23e8
      Uladzislau Rezki (Sony) 提交于
      Refactor the NE_FIT_TYPE split case when it comes to an allocation of one
      extra object.  We need it in order to build a remaining space.  The
      preload is done per CPU in non-atomic context with GFP_KERNEL flags.
      
      More permissive parameters can be beneficial for systems which are suffer
      from high memory pressure or low memory condition.  For example on my KVM
      system(4xCPUs, no swap, 256MB RAM) i can simulate the failure of page
      allocation with GFP_NOWAIT flags.  Using "stress-ng" tool and starting N
      workers spinning on fork() and exit(), i can trigger below trace:
      
      <snip>
      [  179.815161] stress-ng-fork: page allocation failure: order:0, mode:0x40800(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
      [  179.815168] CPU: 0 PID: 12612 Comm: stress-ng-fork Not tainted 5.2.0-rc3+ #1003
      [  179.815170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [  179.815171] Call Trace:
      [  179.815178]  dump_stack+0x5c/0x7b
      [  179.815182]  warn_alloc+0x108/0x190
      [  179.815187]  __alloc_pages_slowpath+0xdc7/0xdf0
      [  179.815191]  __alloc_pages_nodemask+0x2de/0x330
      [  179.815194]  cache_grow_begin+0x77/0x420
      [  179.815197]  fallback_alloc+0x161/0x200
      [  179.815200]  kmem_cache_alloc+0x1c9/0x570
      [  179.815202]  alloc_vmap_area+0x32c/0x990
      [  179.815206]  __get_vm_area_node+0xb0/0x170
      [  179.815208]  __vmalloc_node_range+0x6d/0x230
      [  179.815211]  ? _do_fork+0xce/0x3d0
      [  179.815213]  copy_process.part.46+0x850/0x1b90
      [  179.815215]  ? _do_fork+0xce/0x3d0
      [  179.815219]  _do_fork+0xce/0x3d0
      [  179.815226]  ? __do_page_fault+0x2bf/0x4e0
      [  179.815229]  do_syscall_64+0x55/0x130
      [  179.815231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  179.815234] RIP: 0033:0x7fedec4c738b
      ...
      [  179.815237] RSP: 002b:00007ffda469d730 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
      [  179.815239] RAX: ffffffffffffffda RBX: 00007ffda469d730 RCX: 00007fedec4c738b
      [  179.815240] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
      [  179.815241] RBP: 00007ffda469d780 R08: 00007fededd6e300 R09: 00007ffda47f50a0
      [  179.815242] R10: 00007fededd6e5d0 R11: 0000000000000246 R12: 0000000000000000
      [  179.815243] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
      [  179.815245] Mem-Info:
      [  179.815249] active_anon:12686 inactive_anon:14760 isolated_anon:0
                      active_file:502 inactive_file:61 isolated_file:70
                      unevictable:2 dirty:0 writeback:0 unstable:0
                      slab_reclaimable:2380 slab_unreclaimable:7520
                      mapped:15069 shmem:14813 pagetables:10833 bounce:0
                      free:1922 free_pcp:229 free_cma:0
      <snip>
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-3-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82dd23e8
    • U
      mm/vmalloc.c: remove "node" argument · cacca6ba
      Uladzislau Rezki (Sony) 提交于
      Patch series "Some cleanups for the KVA/vmalloc", v5.
      
      This patch (of 4):
      
      Remove unused argument from the __alloc_vmap_area() function.
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-2-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cacca6ba
    • J
      mm/mmu_notifier: use hlist_add_head_rcu() · 543bdb2d
      Jean-Philippe Brucker 提交于
      Make mmu_notifier_register() safer by issuing a memory barrier before
      registering a new notifier.  This fixes a theoretical bug on weakly
      ordered CPUs.  For example, take this simplified use of notifiers by a
      driver:
      
      	my_struct->mn.ops = &my_ops; /* (1) */
      	mmu_notifier_register(&my_struct->mn, mm)
      		...
      		hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */
      		...
      
      Once mmu_notifier_register() releases the mm locks, another thread can
      invalidate a range:
      
      	mmu_notifier_invalidate_range()
      		...
      		hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) {
      			if (mn->ops->invalidate_range)
      
      The read side relies on the data dependency between mn and ops to ensure
      that the pointer is properly initialized.  But the write side doesn't have
      any dependency between (1) and (2), so they could be reordered and the
      readers could dereference an invalid mn->ops.  mmu_notifier_register()
      does take all the mm locks before adding to the hlist, but those have
      acquire semantics which isn't sufficient.
      
      By calling hlist_add_head_rcu() instead of hlist_add_head() we update the
      hlist using a store-release, ensuring that readers see prior
      initialization of my_struct.  This situation is better illustated by
      litmus test MP+onceassign+derefonce.
      
      Link: http://lkml.kernel.org/r/20190502133532.24981-1-jean-philippe.brucker@arm.com
      Fixes: cddb8a5c ("mmu-notifiers: core")
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      543bdb2d
    • M
      mm/memory.c: fail when offset == num in first check of __vm_map_pages() · 96756fcb
      Miguel Ojeda 提交于
      If the caller asks us for offset == num, we should already fail in the
      first check, i.e.  the one testing for offsets beyond the object.
      
      At the moment, we are failing on the second test anyway, since count
      cannot be 0.  Still, to agree with the comment of the first test, we
      should first test it there.
      
      Link: http://lkml.kernel.org/r/20190528193004.GA7744@gmail.comSigned-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96756fcb
    • A
      mm/pgtable: drop pgtable_t variable from pte_fn_t functions · 8b1e0f81
      Anshuman Khandual 提交于
      Drop the pgtable_t variable from all implementation for pte_fn_t as none
      of them use it.  apply_to_pte_range() should stop computing it as well.
      Should help us save some cycles.
      
      Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NMatthew Wilcox <willy@infradead.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <jglisse@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b1e0f81
    • M
      unicore32: switch to generic version of pte allocation · c2471e79
      Mike Rapoport 提交于
      Replace __get_free_page() and alloc_pages() calls with the generic
      __pte_alloc_one_kernel() and __pte_alloc_one().
      
      There is no functional change for the kernel PTE allocation.
      
      The difference for the user PTEs, is that the clear_pte_table() is now
      called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the
      GFP flags.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-15-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2471e79
    • M
      um: switch to generic version of pte allocation · f32848e1
      Mike Rapoport 提交于
      um allocates PTE pages with __get_free_page() and uses
      GFP_KERNEL | __GFP_ZERO for the allocations.
      
      Switch it to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-14-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f32848e1
    • M
      riscv: switch to generic version of pte allocation · d1b46fe5
      Mike Rapoport 提交于
      The only difference between the generic and RISC-V implementation of PTE
      allocation is the usage of __GFP_RETRY_MAYFAIL for both kernel and user
      PTEs and the absence of __GFP_ACCOUNT for the user PTEs.
      
      The conversion to the generic version removes the __GFP_RETRY_MAYFAIL and
      ensures that GFP_ACCOUNT is used for the user PTE allocations.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-13-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NPalmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1b46fe5
    • M
      parisc: switch to generic version of pte allocation · 3f4a1308
      Mike Rapoport 提交于
      parisc allocates PTE pages with __get_free_page() and uses
      GFP_KERNEL | __GFP_ZERO for the allocations.
      
      Switch it to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free_kernel() and pte_free() versions on are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-12-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f4a1308
    • M
      nios2: switch to generic version of pte allocation · fc7835c2
      Mike Rapoport 提交于
      nios2 allocates kernel PTE pages with
      
              __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER);
      
      and user page tables with
      
              pte = alloc_pages(GFP_KERNEL, PTE_ORDER);
              if (pte)
                      clear_highpage();
      
      The PTE_ORDER is hardwired to zero, which makes nios2 implementation almost
      identical to the generic one.
      
      Switch nios2 to the generic version that does exactly the same thing for
      the kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free_kernel() and pte_free() versions on nios2 are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-11-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc7835c2
    • M
      nds32: switch to generic version of pte allocation · f52a8e1a
      Mike Rapoport 提交于
      The nds32 implementation of pte_alloc_one_kernel() differs from the
      generic in the use of __GFP_RETRY_MAYFAIL flag, which is removed after the
      conversion.
      
      The nds32 version of pte_alloc_one() missed the call to
      pgtable_page_ctor() and also used __GFP_RETRY_MAYFAIL.  Switching it to
      use generic __pte_alloc_one() for the PTE page allocation ensures that
      page table constructor is run and the user page tables are allocated with
      __GFP_ACCOUNT.
      
      The conversion to the generic version of pte_free_kernel() removes the
      NULL check for pte.
      
      The pte_free() version on nds32 is identical to the generic one and can be
      simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-10-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f52a8e1a
    • M
      mips: switch to generic version of pte allocation · b7902ce1
      Mike Rapoport 提交于
      MIPS allocates kernel PTE pages with
      
      	__get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER)
      
      and user PTE pages with
      
      	pte = alloc_pages(GFP_KERNEL, PTE_ORDER)
      
      and then uses clear_highpage(pte) to zero out the allocated page for the
      user page tables.
      
      The PTE_ORDER is hardwired to zero, which makes MIPS implementation almost
      identical to the generic one.
      
      Switch MIPS to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free_kernel() and pte_free() versions on mips are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-9-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NPaul Burton <paul.burton@mips.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7902ce1
    • M
      m68k: sun3: switch to generic version of pte allocation · 14c0a39c
      Mike Rapoport 提交于
      The sun3 MMU variant of m68k uses GFP_KERNEL to allocate a PTE page and
      then memset(0) or clear_highpage() to clear it.
      
      This is equivalent to allocating the page with GFP_KERNEL | __GFP_ZERO,
      which allows replacing sun3 implementation of pte_alloc_one() and
      pte_alloc_one_kernel() with the generic ones.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-8-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14c0a39c
    • M
      csky: switch to generic version of pte allocation · bd5ff066
      Mike Rapoport 提交于
      The csky implementation pte_alloc_one(), pte_free_kernel() and pte_free()
      is identical to the generic except of lack of __GFP_ACCOUNT for the user
      PTEs allocation.
      
      Switch csky to use generic version of these functions.
      
      The csky implementation of pte_alloc_one_kernel() is not replaced because
      it does not clear the allocated page but rather sets each PTE in it to a
      non-zero value.
      
      The pte_free_kernel() and pte_free() versions on csky are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-6-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd5ff066
    • M
      arm64: switch to generic version of pte allocation · 50f11a8a
      Mike Rapoport 提交于
      The PTE allocations in arm64 are identical to the generic ones modulo the
      GFP flags.
      
      Using the generic pte_alloc_one() functions ensures that the user page
      tables are allocated with __GFP_ACCOUNT set.
      
      The arm64 definition of PGALLOC_GFP is removed and replaced with
      GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and
      GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now
      using GFP_PGTABLE_USER.
      
      The mappings created with create_pgd_mapping() are now using
      GFP_PGTABLE_KERNEL.
      
      The conversion to the generic version of pte_free_kernel() removes the NULL
      check for pte.
      
      The pte_free() version on arm64 is identical to the generic one and
      can be simply dropped.
      
      [cai@lca.pw: fix a bogus GFP flag in pgd_alloc()]
        Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/
      [and fix it more]
        Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/
      Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50f11a8a
    • M
      arm: switch to generic version of pte allocation · 28bcf593
      Mike Rapoport 提交于
      Replace __get_free_page() and alloc_pages() calls with the generic
      __pte_alloc_one_kernel() and __pte_alloc_one().
      
      There is no functional change for the kernel PTE allocation.
      
      The difference for the user PTEs, is that the clear_pte_table() is now
      called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the
      GFP flags.
      
      The conversion to the generic version of pte_free_kernel() removes the NULL
      check for pte.
      
      The pte_free() version on arm is identical to the generic one and can be
      simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-4-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28bcf593
    • M
      alpha: switch to generic version of pte allocation · bc3ace9b
      Mike Rapoport 提交于
      alpha allocates PTE pages with __get_free_page() and uses
      GFP_KERNEL | __GFP_ZERO for the allocations.
      
      Switch it to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The alpha pte_free() and pte_free_kernel() versions are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-3-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc3ace9b
    • M
      asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel] · 5fba4af4
      Mike Rapoport 提交于
      Most architectures have identical or very similar implementation of
      pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and
      pte_free().
      
      Add a generic implementation that can be reused across architectures and
      enable its use on x86.
      
      The generic implementation uses
      
      	GFP_KERNEL | __GFP_ZERO
      
      for the kernel page tables and
      
      	GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT
      
      for the user page tables.
      
      The "base" functions for PTE allocation, namely __pte_alloc_one_kernel()
      and __pte_alloc_one() are intended for the architectures that require
      additional actions after actual memory allocation or must use non-default
      GFP flags.
      
      x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and
      pte_free().
      
      x86 still implements pte_alloc_one() to allow run-time control of GFP
      flags required for "userpte" command line option.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fba4af4
    • G
      mm/gup.c: mark undo_dev_pagemap as __maybe_unused · 790c7369
      Guenter Roeck 提交于
      Several mips builds generate the following build warning.
      
        mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used
      
      The function is declared unconditionally but only called from behind
      various ifdefs. Mark it __maybe_unused.
      
      Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.netSigned-off-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      790c7369
    • A
      mm/gup.c: remove some BUG_ONs from get_gate_page() · b5d1c39f
      Andy Lutomirski 提交于
      If we end up without a PGD or PUD entry backing the gate area, don't BUG
      -- just fail gracefully.
      
      It's not entirely implausible that this could happen some day on x86.  It
      doesn't right now even with an execute-only emulated vsyscall page because
      the fixmap shares the PUD, but the core mm code shouldn't rely on that
      particular detail to avoid OOPSing.
      
      Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.orgSigned-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5d1c39f
    • P
      mm/gup: speed up check_and_migrate_cma_pages() on huge page · aa712399
      Pingfan Liu 提交于
      Both hugetlb and thp locate on the same migration type of pageblock, since
      they are allocated from a free_list[].  Based on this fact, it is enough
      to check on a single subpage to decide the migration type of the whole
      huge page.  By this way, it saves (2M/4K - 1) times loop for pmd_huge on
      x86, similar on other archs.
      
      Furthermore, when executing isolate_huge_page(), it avoid taking global
      hugetlb_lock many times, and meanless remove/add to the local link list
      cma_page_list.
      
      [akpm@linux-foundation.org: make `i' and `step' unsigned]
      Link: http://lkml.kernel.org/r/1561612545-28997-1-git-send-email-kernelfans@gmail.comSigned-off-by: NPingfan Liu <kernelfans@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa712399
    • C
      mm: mark the page referenced in gup_hugepte · 520b4a44
      Christoph Hellwig 提交于
      All other get_user_page_fast cases mark the page referenced, so do this
      here as well.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-17-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      520b4a44
    • C
      mm: switch gup_hugepte to use try_get_compound_head · 01a36916
      Christoph Hellwig 提交于
      This applies the overflow fixes from 8fde12ca ("mm: prevent
      get_user_pages() from overflowing page refcount") to the powerpc hugepd
      code and brings it back in sync with the other GUP cases.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-16-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01a36916
    • C
      mm: move the powerpc hugepd code to mm/gup.c · cbd34da7
      Christoph Hellwig 提交于
      While only powerpc supports the hugepd case, the code is pretty generic
      and I'd like to keep all GUP internals in one place.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbd34da7
    • C
      mm: validate get_user_pages_fast flags · 817be129
      Christoph Hellwig 提交于
      We can only deal with FOLL_WRITE and/or FOLL_LONGTERM in
      get_user_pages_fast, so reject all other flags.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-14-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      817be129
    • C
      mm: consolidate the get_user_pages* implementations · 050a9adc
      Christoph Hellwig 提交于
      Always build mm/gup.c so that we don't have to provide separate nommu
      stubs.  Also merge the get_user_pages_fast and __get_user_pages_fast stubs
      when HAVE_FAST_GUP into the main implementations, which will never call
      the fast path if HAVE_FAST_GUP is not set.
      
      This also ensures the new put_user_pages* helpers are available for nommu,
      as those are currently missing, which would create a problem as soon as we
      actually grew users for it.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      050a9adc
    • C
      mm: reorder code blocks in gup.c · d3649f68
      Christoph Hellwig 提交于
      This moves the actually exported functions towards the end of the file,
      and reorders some functions to be in more logical blocks as a preparation
      for moving various stubs inline into the main functionality using
      IS_ENABLED().
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-12-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3649f68
    • C
      mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUP · 67a929e0
      Christoph Hellwig 提交于
      We only support the generic GUP now, so rename the config option to
      be more clear, and always use the mm/Kconfig definition of the
      symbol and select it from the arch Kconfigs.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67a929e0
    • C
      sparc64: use the generic get_user_pages_fast code · 7b9afb86
      Christoph Hellwig 提交于
      The sparc64 code is mostly equivalent to the generic one, minus various
      bugfixes and two arch overrides that this patch adds to pgtable.h.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-10-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b9afb86
    • C
      sparc64: define untagged_addr() · 5875509d
      Christoph Hellwig 提交于
      Add a helper to untag a user pointer.  This is needed for ADI support
      in get_user_pages_fast.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-9-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5875509d
    • C
      sparc64: add the missing pgd_page definition · d8550790
      Christoph Hellwig 提交于
      sparc64 only had pgd_page_vaddr, but not pgd_page.
      
      [hch@lst.de: fix sparc64 build]
        Link: http://lkml.kernel.org/r/20190626131318.GA5101@lst.de
      Link: http://lkml.kernel.org/r/20190625143715.1689-8-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8550790
    • C
      sh: use the generic get_user_pages_fast code · 3c9b9acc
      Christoph Hellwig 提交于
      The sh code is mostly equivalent to the generic one, minus various
      bugfixes and two arch overrides that this patch adds to pgtable.h.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-7-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c9b9acc
    • C
      sh: add the missing pud_page definition · 2f85e7f9
      Christoph Hellwig 提交于
      sh only had pud_page_vaddr, but not pud_page.
      
      [hch@lst.de: sh: stub out pud_page]
        Link: http://lkml.kernel.org/r/20190701151818.32227-2-hch@lst.de
      Link: http://lkml.kernel.org/r/20190625143715.1689-6-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f85e7f9
    • C
      MIPS: use the generic get_user_pages_fast code · 446f062b
      Christoph Hellwig 提交于
      The mips code is mostly equivalent to the generic one, minus various
      bugfixes and an arch override for gup_fast_permitted.
      
      Note that this defines ARCH_HAS_PTE_SPECIAL for mips as mips has
      pte_special and pte_mkspecial implemented and used in the existing gup
      code.  They are no-op stubs, though which makes me a little unsure if this
      is really right thing to do.
      
      Note that this also adds back a missing cpu_has_dc_aliases check for
      __get_user_pages_fast, which the old code was only doing for
      get_user_pages_fast.  This clearly looks like an oversight, as any
      condition that makes get_user_pages_fast unsafe also applies to
      __get_user_pages_fast.
      
      [hch@lst.de: MIPS: don't select ARCH_HAS_PTE_SPECIAL]
        Link: http://lkml.kernel.org/r/20190701151818.32227-3-hch@lst.de
      Link: http://lkml.kernel.org/r/20190625143715.1689-5-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      446f062b
    • C
      mm: lift the x86_32 PAE version of gup_get_pte to common code · 39656e83
      Christoph Hellwig 提交于
      The split low/high access is the only non-READ_ONCE version of gup_get_pte
      that did show up in the various arch implemenations.  Lift it to common
      code and drop the ifdef based arch override.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39656e83
    • C
      mm: simplify gup_fast_permitted · 26f4c328
      Christoph Hellwig 提交于
      Pass in the already calculated end value instead of recomputing it, and
      leave the end > start check in the callers instead of duplicating them in
      the arch code.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26f4c328
    • C
      mm: use untagged_addr() for get_user_pages_fast addresses · f455c854
      Christoph Hellwig 提交于
      Patch series "switch the remaining architectures to use generic GUP", v4.
      
      A series to switch mips, sh and sparc64 to use the generic GUP code so
      that we only have one codebase to touch for further improvements to this
      code.
      
      This patch (of 16):
      
      This will allow sparc64, or any future architecture with memory tagging to
      override its tags for get_user_pages and get_user_pages_fast.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-2-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f455c854
    • W
      mm, memcg: add a memcg_slabinfo debugfs file · fcf8a1e4
      Waiman Long 提交于
      There are concerns about memory leaks from extensive use of memory cgroups
      as each memory cgroup creates its own set of kmem caches.  There is a
      possiblity that the memcg kmem caches may remain even after the memory
      cgroups have been offlined.  Therefore, it will be useful to show the
      status of each of memcg kmem caches.
      
      This patch introduces a new <debugfs>/memcg_slabinfo file which is
      somewhat similar to /proc/slabinfo in format, but lists only information
      about kmem caches that have child memcg kmem caches.  Information
      available in /proc/slabinfo are not repeated in memcg_slabinfo.
      
      A portion of a sample output of the file was:
      
        # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
        rpc_inode_cache   root          13     51      1      1
        rpc_inode_cache     48           0      0      0      0
        fat_inode_cache   root           1     45      1      1
        fat_inode_cache     41           2     45      1      1
        xfs_inode         root         770    816     24     24
        xfs_inode           92          22     34      1      1
        xfs_inode           88:dead      1     34      1      1
        xfs_inode           89:dead     23     34      1      1
        xfs_inode           85           4     34      1      1
        xfs_inode           84           9     34      1      1
      
      The css id of the memcg is also listed. If a memcg is not online,
      the tag ":dead" will be attached as shown above.
      
      [longman@redhat.com: memcg: add ":deact" tag for reparented kmem caches in memcg_slabinfo]
        Link: http://lkml.kernel.org/r/20190621173005.31514-1-longman@redhat.com
      [longman@redhat.com: set the flag in the common code as suggested by Roman]
        Link: http://lkml.kernel.org/r/20190627184324.5875-1-longman@redhat.com
      Link: http://lkml.kernel.org/r/20190619171621.26209-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
      Suggested-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf8a1e4
    • R
      mm: memcg/slab: reparent memcg kmem_caches on cgroup removal · fb2f2b0a
      Roman Gushchin 提交于
      Let's reparent non-root kmem_caches on memcg offlining.  This allows us to
      release the memory cgroup without waiting for the last outstanding kernel
      object (e.g.  dentry used by another application).
      
      Since the parent cgroup is already charged, everything we need to do is to
      splice the list of kmem_caches to the parent's kmem_caches list, swap the
      memcg pointer, drop the css refcounter for each kmem_cache and adjust the
      parent's css refcounter.
      
      Please, note that kmem_cache->memcg_params.memcg isn't a stable pointer
      anymore.  It's safe to read it under rcu_read_lock(), cgroup_mutex held,
      or any other way that protects the memory cgroup from being released.
      
      We can race with the slab allocation and deallocation paths.  It's not a
      big problem: parent's charge and slab global stats are always correct, and
      we don't care anymore about the child usage and global stats.  The child
      cgroup is already offline, so we don't use or show it anywhere.
      
      Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE) aren't
      used anywhere except count_shadow_nodes().  But even there it won't break
      anything: after reparenting "nodes" will be 0 on child level (because
      we're already reparenting shrinker lists), and on parent level page stats
      always were 0, and this patch won't change anything.
      
      [guro@fb.com: properly handle kmem_caches reparented to root_mem_cgroup]
        Link: http://lkml.kernel.org/r/20190620213427.1691847-1-guro@fb.com
      Link: http://lkml.kernel.org/r/20190611231813.3148843-11-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NVladimir Davydov <vdavydov.dev@gmail.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb2f2b0a