1. 03 4月, 2020 3 次提交
  2. 06 3月, 2020 1 次提交
    • V
      mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled · c87cbc1f
      Vlastimil Babka 提交于
      Commit cd02cf1a ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
      fixed memory hotplug with debug_pagealloc enabled, where onlining a page
      goes through page freeing, which removes the direct mapping.  Some arches
      don't like when the page is not mapped in the first place, so
      generic_online_page() maps it first.  This is somewhat wasteful, but
      better than special casing page freeing fast paths.
      
      The commit however missed that DEBUG_PAGEALLOC configured doesn't mean
      it's actually enabled.  One has to test debug_pagealloc_enabled() since
      031bc574 ("mm/debug-pagealloc: make debug-pagealloc boottime
      configurable"), or alternatively debug_pagealloc_enabled_static() since
      8e57f8ac ("mm, debug_pagealloc: don't rely on static keys too early"),
      but this is not done.
      
      As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled
      will crash:
      
      Unable to handle kernel pointer dereference in virtual kernel address space
      Failing address: 0000000000000000 TEID: 0000000000000483
      Fault in home space mode while using kernel ASCE.
      AS:0000001ece13400b R2:000003fff7fd000b R3:000003fff7fcc007 S:000003fff7fd7000 P:000000000000013d
      Oops: 0004 ilc:2 [#1] SMP
      CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased)
      Krnl PSW : 0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188)
      R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
      Krnl GPRS: 0000000000000000 0000000000000800 0000400b00000000 0000000000000100
      0000000000000001 0000000000000000 0000000000000002 0000000000000100
      0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000
      000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20
      Krnl Code: 0000001ecd281b8e: ec17ffff00d8 ahik %r1,%r7,-1
      0000001ecd281b94: ec111dbc0355 risbg %r1,%r1,29,188,3
      >0000001ecd281b9e: 94fb5006 ni 6(%r5),251
      0000001ecd281ba2: 41505008 la %r5,8(%r5)
      0000001ecd281ba6: ec51fffc6064 cgrj %r5,%r1,6,1ecd281b9e
      0000001ecd281bac: 1a07 ar %r0,%r7
      0000001ecd281bae: ec03ff584076 crj %r0,%r3,4,1ecd281a5e
      Call Trace:
      [<0000001ecd281b9e>] __kernel_map_pages+0x166/0x188
      [<0000001ecd4d9516>] online_pages_range+0xf6/0x128
      [<0000001ecd2a8186>] walk_system_ram_range+0x7e/0xd8
      [<0000001ecda28aae>] online_pages+0x2fe/0x3f0
      [<0000001ecd7d02a6>] memory_subsys_online+0x8e/0xc0
      [<0000001ecd7add42>] device_online+0x5a/0xc8
      [<0000001ecd7d0430>] state_store+0x88/0x118
      [<0000001ecd5b9f62>] kernfs_fop_write+0xc2/0x200
      [<0000001ecd5064b6>] vfs_write+0x176/0x1e0
      [<0000001ecd50676a>] ksys_write+0xa2/0x100
      [<0000001ecda315d4>] system_call+0xd8/0x2c8
      
      Fix this by checking debug_pagealloc_enabled_static() before calling
      kernel_map_pages(). Backports for kernel before 5.5 should use
      debug_pagealloc_enabled() instead. Also add comments.
      
      Fixes: cd02cf1a ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
      Reported-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Qian Cai <cai@lca.pw>
      Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c87cbc1f
  3. 04 2月, 2020 1 次提交
    • D
      mm/page_alloc.c: initialize memmap of unavailable memory directly · 4b094b78
      David Hildenbrand 提交于
      Let's make sure that all memory holes are actually marked PageReserved(),
      that page_to_pfn() produces reliable results, and that these pages are not
      detected as "mmap" pages due to the mapcount.
      
      E.g., booting a x86-64 QEMU guest with 4160 MB:
      
      [    0.010585] Early memory node ranges
      [    0.010586]   node   0: [mem 0x0000000000001000-0x000000000009efff]
      [    0.010588]   node   0: [mem 0x0000000000100000-0x00000000bffdefff]
      [    0.010589]   node   0: [mem 0x0000000100000000-0x0000000143ffffff]
      
      max_pfn is 0x144000.
      
      Before this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000000000800           16384       64  ___________M_______________________________        mmap
                   total           16384       64
      
      After this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000100000000           16384       64  ___________________________r_______________        reserved
                   total           16384       64
      
      IOW, especially the unavailable physical memory ("memory hole") in the
      last section would not get properly marked PageReserved() and is indicated
      to be "mmap" memory.
      
      Drop the trace of that function from include/linux/mm.h - nobody else
      needs it, and rename it accordingly.
      
      Note: The fake zone/node might not be covered by the zone/node span.  This
      is not an urgent issue (for now, we had the same node/zone due to the
      zeroing).  We'll need a clean way to mark memory holes (e.g., using a page
      type PageHole() if possible or a fake ZONE_INVALID) and eventually stop
      marking these memory holes PageReserved().
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b094b78
  4. 01 2月, 2020 5 次提交
  5. 21 1月, 2020 1 次提交
  6. 16 1月, 2020 1 次提交
  7. 14 1月, 2020 1 次提交
    • V
      mm, debug_pagealloc: don't rely on static keys too early · 8e57f8ac
      Vlastimil Babka 提交于
      Commit 96a2b03f ("mm, debug_pagelloc: use static keys to enable
      debugging") has introduced a static key to reduce overhead when
      debug_pagealloc is compiled in but not enabled.  It relied on the
      assumption that jump_label_init() is called before parse_early_param()
      as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
      it is safe to enable the static key.
      
      However, it turns out multiple architectures call parse_early_param()
      earlier from their setup_arch().  x86 also calls jump_label_init() even
      earlier, so no issue was found while testing the commit, but same is not
      true for e.g.  ppc64 and s390 where the kernel would not boot with
      debug_pagealloc=on as found by our QA.
      
      To fix this without tricky changes to init code of multiple
      architectures, this patch partially reverts the static key conversion
      from 96a2b03f.  Init-time and non-fastpath calls (such as in arch
      code) of debug_pagealloc_enabled() will again test a simple bool
      variable.  Fastpath mm code is converted to a new
      debug_pagealloc_enabled_static() variant that relies on the static key,
      which is enabled in a well-defined point in mm_init() where it's
      guaranteed that jump_label_init() has been called, regardless of
      architecture.
      
      [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
        Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
      Fixes: 96a2b03f ("mm, debug_pagelloc: use static keys to enable debugging")
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Qian Cai <cai@lca.pw>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e57f8ac
  8. 18 12月, 2019 1 次提交
    • D
      mm/memory.c: add apply_to_existing_page_range() helper · be1db475
      Daniel Axtens 提交于
      apply_to_page_range() takes an address range, and if any parts of it are
      not covered by the existing page table hierarchy, it allocates memory to
      fill them in.
      
      In some use cases, this is not what we want - we want to be able to
      operate exclusively on PTEs that are already in the tables.
      
      Add apply_to_existing_page_range() for this.  Adjust the walker
      functions for apply_to_page_range to take 'create', which switches them
      between the old and new modes.
      
      This will be used in KASAN vmalloc.
      
      [akpm@linux-foundation.org: reduce code duplication]
      [akpm@linux-foundation.org: s/apply_to_existing_pages/apply_to_existing_page_range/]
      [akpm@linux-foundation.org: initialize __apply_to_page_range::err]
      Link: http://lkml.kernel.org/r/20191205140407.1874-1-dja@axtens.netSigned-off-by: NDaniel Axtens <dja@axtens.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be1db475
  9. 10 12月, 2019 1 次提交
    • I
      mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions · 186525bd
      Ingo Molnar 提交于
      - Untangle the somewhat incestous way of how VMALLOC_START is used all across the
        kernel, but is, on x86, defined deep inside one of the lowest level page table headers.
        It doesn't help that vmalloc.h only includes a single asm header:
      
           #include <asm/page.h>           /* pgprot_t */
      
        So there was no existing cross-arch way to decouple address layout
        definitions from page.h details. I used this:
      
         #ifndef VMALLOC_START
         # include <asm/vmalloc.h>
         #endif
      
        This way every architecture that wants to simplify page.h can do so.
      
      - Also on x86 we had a couple of LDT related inline functions that used
        the late-stage address space layout positions - but these could be
        uninlined without real trouble - the end result is cleaner this way as
        well.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      186525bd
  10. 05 12月, 2019 1 次提交
    • M
      mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h · f949286c
      Mike Rapoport 提交于
      There are no architectures that use include/asm-generic/4level-fixup.h
      therefore it can be removed along with __ARCH_HAS_4LEVEL_HACK define.
      
      Link: http://lkml.kernel.org/r/1572938135-31886-14-git-send-email-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Anatoly Pugachev <matorola@gmail.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Peter Rosin <peda@axentia.se>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f949286c
  11. 02 12月, 2019 2 次提交
  12. 01 12月, 2019 3 次提交
  13. 07 11月, 2019 1 次提交
    • Y
      mm: thp: handle page cache THP correctly in PageTransCompoundMap · 169226f7
      Yang Shi 提交于
      We have a usecase to use tmpfs as QEMU memory backend and we would like
      to take the advantage of THP as well.  But, our test shows the EPT is
      not PMD mapped even though the underlying THP are PMD mapped on host.
      The number showed by /sys/kernel/debug/kvm/largepage is much less than
      the number of PMD mapped shmem pages as the below:
      
        7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
        Size:            4194304 kB
        [snip]
        AnonHugePages:         0 kB
        ShmemPmdMapped:   579584 kB
        [snip]
        Locked:                0 kB
      
        cat /sys/kernel/debug/kvm/largepages
        12
      
      And some benchmarks do worse than with anonymous THPs.
      
      By digging into the code we figured out that commit 127393fb ("mm:
      thp: kvm: fix memory corruption in KVM with THP enabled") checks if
      there is a single PTE mapping on the page for anonymous THP when setting
      up EPT map.  But the _mapcount < 0 check doesn't work for page cache THP
      since every subpage of page cache THP would get _mapcount inc'ed once it
      is PMD mapped, so PageTransCompoundMap() always returns false for page
      cache THP.  This would prevent KVM from setting up PMD mapped EPT entry.
      
      So we need handle page cache THP correctly.  However, when page cache
      THP's PMD gets split, kernel just remove the map instead of setting up
      PTE map like what anonymous THP does.  Before KVM calls get_user_pages()
      the subpages may get PTE mapped even though it is still a THP since the
      page cache THP may be mapped by other processes at the mean time.
      
      Checking its _mapcount and whether the THP has PTE mapped or not.
      Although this may report some false negative cases (PTE mapped by other
      processes), it looks not trivial to make this accurate.
      
      With this fix /sys/kernel/debug/kvm/largepage would show reasonable
      pages are PMD mapped by EPT as the below:
      
        7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
        Size:            4194304 kB
        [snip]
        AnonHugePages:         0 kB
        ShmemPmdMapped:   557056 kB
        [snip]
        Locked:                0 kB
      
        cat /sys/kernel/debug/kvm/largepages
        271
      
      And the benchmarks are as same as anonymous THPs.
      
      [yang.shi@linux.alibaba.com: v4]
        Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
      Fixes: dd78fedd ("rmap: support file thp")
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reported-by: NGang Deng <gavin.dg@linux.alibaba.com>
      Tested-by: NGang Deng <gavin.dg@linux.alibaba.com>
      Suggested-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      169226f7
  14. 06 11月, 2019 1 次提交
    • T
      mm: Add write-protect and clean utilities for address space ranges · c5acad84
      Thomas Hellstrom 提交于
      Add two utilities to 1) write-protect and 2) clean all ptes pointing into
      a range of an address space.
      The utilities are intended to aid in tracking dirty pages (either
      driver-allocated system memory or pci device memory).
      The write-protect utility should be used in conjunction with
      page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
      accesses. Typically one would want to use this on sparse accesses into
      large memory regions. The clean utility should be used to utilize
      hardware dirtying functionality and avoid the overhead of page-faults,
      typically on large accesses into small memory regions.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      c5acad84
  15. 27 9月, 2019 1 次提交
    • M
      mm: treewide: clarify pgtable_page_{ctor,dtor}() naming · b4ed71f5
      Mark Rutland 提交于
      The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
      people, and until recently arm64 used these erroneously/pointlessly for
      other levels of page table.
      
      To make it incredibly clear that these only apply to the PTE level, and to
      align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
      to pgtable_pte_page_{ctor,dtor}().
      
      These changes were generated with the following shell script:
      
      ----
      git grep -lw 'pgtable_page_.tor' | while read FILE; do
          sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
          sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
      done
      ----
      
      ... with the documentation re-flowed to remain under 80 columns, and
      whitespace fixed up in macros to keep backslashes aligned.
      
      There should be no functional change as a result of this patch.
      
      Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4ed71f5
  16. 25 9月, 2019 8 次提交
  17. 07 9月, 2019 1 次提交
  18. 19 7月, 2019 2 次提交
  19. 17 7月, 2019 4 次提交
  20. 16 7月, 2019 1 次提交