1. 01 5月, 2021 7 次提交
  2. 09 4月, 2021 1 次提交
  3. 08 4月, 2021 1 次提交
  4. 26 3月, 2021 1 次提交
    • A
      kasan: fix per-page tags for non-page_alloc pages · cf10bd4c
      Andrey Konovalov 提交于
      To allow performing tag checks on page_alloc addresses obtained via
      page_address(), tag-based KASAN modes store tags for page_alloc
      allocations in page->flags.
      
      Currently, the default tag value stored in page->flags is 0x00.
      Therefore, page_address() returns a 0x00ffff...  address for pages that
      were not allocated via page_alloc.
      
      This might cause problems.  A particular case we encountered is a
      conflict with KFENCE.  If a KFENCE-allocated slab object is being freed
      via kfree(page_address(page) + offset), the address passed to kfree()
      will get tagged with 0x00 (as slab pages keep the default per-page
      tags).  This leads to is_kfence_address() check failing, and a KFENCE
      object ending up in normal slab freelist, which causes memory
      corruptions.
      
      This patch changes the way KASAN stores tag in page-flags: they are now
      stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
      per-page flags for every created page, which might be slow.
      
      With this change, page_address() returns natively-tagged (with 0xff)
      pointers for pages that didn't have tags set explicitly.
      
      This patch fixes the encountered conflict with KFENCE and prevents more
      similar issues that can occur in the future.
      
      Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
      Fixes: 2813b9c0 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf10bd4c
  5. 14 3月, 2021 1 次提交
    • P
      mm: introduce page_needs_cow_for_dma() for deciding whether cow · 97a7e473
      Peter Xu 提交于
      We've got quite a few places (pte, pmd, pud) that explicitly checked
      against whether we should break the cow right now during fork().  It's
      easier to provide a helper, especially before we work the same thing on
      hugetlbfs.
      
      Since we'll reference is_cow_mapping() in mm.h, move it there too.
      Actually it suites mm.h more since internal.h is mm/ only, but mm.h is
      exported to the whole kernel.  With that we should expect another patch to
      use is_cow_mapping() whenever we can across the kernel since we do use it
      quite a lot but it's always done with raw code against VM_* flags.
      
      Link: https://lkml.kernel.org/r/20210217233547.93892-4-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NJason Gunthorpe <jgg@ziepe.ca>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Gal Pressman <galpress@amazon.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill Shutemov <kirill@shutemov.name>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Roland Scheidegger <sroland@vmware.com>
      Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
      Cc: Wei Zhang <wzam@amazon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97a7e473
  6. 09 3月, 2021 1 次提交
    • P
      mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels · 5bb1bb35
      Paul E. McKenney 提交于
      The mem_dump_obj() functionality adds a few hundred bytes, which is a
      small price to pay.  Except on kernels built with CONFIG_PRINTK=n, in
      which mem_dump_obj() messages will be suppressed.  This commit therefore
      makes mem_dump_obj() be a static inline empty function on kernels built
      with CONFIG_PRINTK=n and excludes all of its support functions as well.
      This avoids kernel bloat on systems that cannot use mem_dump_obj().
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <linux-mm@kvack.org>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      5bb1bb35
  7. 25 2月, 2021 5 次提交
  8. 09 2月, 2021 1 次提交
    • P
      mm: provide a saner PTE walking API for modules · 9fd6dad1
      Paolo Bonzini 提交于
      Currently, the follow_pfn function is exported for modules but
      follow_pte is not.  However, follow_pfn is very easy to misuse,
      because it does not provide protections (so most of its callers
      assume the page is writable!) and because it returns after having
      already unlocked the page table lock.
      
      Provide instead a simplified version of follow_pte that does
      not have the pmdpp and range arguments.  The older version
      survives as follow_invalidate_pte() for use by fs/dax.c.
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9fd6dad1
  9. 05 2月, 2021 1 次提交
  10. 23 1月, 2021 1 次提交
    • P
      mm: Add mem_dump_obj() to print source of memory block · 8e7f37f2
      Paul E. McKenney 提交于
      There are kernel facilities such as per-CPU reference counts that give
      error messages in generic handlers or callbacks, whose messages are
      unenlightening.  In the case of per-CPU reference-count underflow, this
      is not a problem when creating a new use of this facility because in that
      case the bug is almost certainly in the code implementing that new use.
      However, trouble arises when deploying across many systems, which might
      exercise corner cases that were not seen during development and testing.
      Here, it would be really nice to get some kind of hint as to which of
      several uses the underflow was caused by.
      
      This commit therefore exposes a mem_dump_obj() function that takes
      a pointer to memory (which must still be allocated if it has been
      dynamically allocated) and prints available information on where that
      memory came from.  This pointer can reference the middle of the block as
      well as the beginning of the block, as needed by things like RCU callback
      functions and timer handlers that might not know where the beginning of
      the memory block is.  These functions and handlers can use mem_dump_obj()
      to print out better hints as to where the problem might lie.
      
      The information printed can depend on kernel configuration.  For example,
      the allocation return address can be printed only for slab and slub,
      and even then only when the necessary debug has been enabled.  For slab,
      build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
      to the next power of two or use the SLAB_STORE_USER when creating the
      kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
      boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
      if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
      to enable printing of the allocation-time stack trace.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
      [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
      [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
      [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
      [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
      [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      8e7f37f2
  11. 21 1月, 2021 3 次提交
  12. 20 1月, 2021 2 次提交
    • W
      mm: Allow architectures to request 'old' entries when prefaulting · 46bdb427
      Will Deacon 提交于
      Commit 5c0a85fa ("mm: make faultaround produce old ptes") changed
      the "faultaround" behaviour to initialise prefaulted PTEs as 'old',
      since this avoids vmscan wrongly assuming that they are hot, despite
      having never been explicitly accessed by userspace. The change has been
      shown to benefit numerous arm64 micro-architectures (with hardware
      access flag) running Android, where both application launch latency and
      direct reclaim time are significantly reduced (by 10%+ and ~80%
      respectively).
      
      Unfortunately, commit 315d09bf ("Revert "mm: make faultaround
      produce old ptes"") reverted the change due to it being identified as
      the cause of a ~6% regression in unixbench on x86. Experiments on a
      variety of recent arm64 micro-architectures indicate that unixbench is
      not affected by the original commit, which appears to yield a 0-1%
      performance improvement.
      
      Since one size does not fit all for the initial state of prefaulted
      PTEs, introduce arch_wants_old_prefaulted_pte(), which allows an
      architecture to opt-in to 'old' prefaulted PTEs at runtime based on
      whatever criteria it may have.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NVinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      46bdb427
    • K
      mm: Cleanup faultaround and finish_fault() codepaths · f9ce0be7
      Kirill A. Shutemov 提交于
      alloc_set_pte() has two users with different requirements: in the
      faultaround code, it called from an atomic context and PTE page table
      has to be preallocated. finish_fault() can sleep and allocate page table
      as needed.
      
      PTL locking rules are also strange, hard to follow and overkill for
      finish_fault().
      
      Let's untangle the mess. alloc_set_pte() has gone now. All locking is
      explicit.
      
      The price is some code duplication to handle huge pages in faultaround
      path, but it should be fine, having overall improvement in readability.
      
      Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@boxSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      [will: s/from from/from/ in comment; spotted by willy]
      Signed-off-by: NWill Deacon <will@kernel.org>
      f9ce0be7
  13. 12 1月, 2021 3 次提交
    • D
      mm: Close race in generic_access_phys · 96667f8a
      Daniel Vetter 提交于
      Way back it was a reasonable assumptions that iomem mappings never
      change the pfn range they point at. But this has changed:
      
      - gpu drivers dynamically manage their memory nowadays, invalidating
        ptes with unmap_mapping_range when buffers get moved
      
      - contiguous dma allocations have moved from dedicated carvetouts to
        cma regions. This means if we miss the unmap the pfn might contain
        pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
      
      - even /dev/mem now invalidates mappings when the kernel requests that
        iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac66
        ("/dev/mem: Revoke mappings when a driver claims the region")
      
      Accessing pfns obtained from ptes without holding all the locks is
      therefore no longer a good idea. Fix this.
      
      Since ioremap might need to manipulate pagetables too we need to drop
      the pt lock and have a retry loop if we raced.
      
      While at it, also add kerneldoc and improve the comment for the
      vma_ops->access function. It's for accessing, not for moving the
      memory from iomem to system memory, as the old comment seemed to
      suggest.
      
      References: 28b2ee20 ("access_process_vm device memory infrastructure")
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-samsung-soc@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-8-daniel.vetter@ffwll.ch
      96667f8a
    • D
      media: videobuf2: Move frame_vector into media subsystem · eb83b8e3
      Daniel Vetter 提交于
      It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
      symbol from all over the tree (well just one place, somehow omap media
      driver still had this in its Kconfig, despite not using it).
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
      Acked-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Acked-by: NTomasz Figa <tfiga@chromium.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Tomasz Figa <tfiga@chromium.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-samsung-soc@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-7-daniel.vetter@ffwll.ch
      eb83b8e3
    • D
      mm/frame-vector: Use FOLL_LONGTERM · 04769cb1
      Daniel Vetter 提交于
      This is used by media/videbuf2 for persistent dma mappings, not just
      for a single dma operation and then freed again, so needs
      FOLL_LONGTERM.
      
      Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
      locking issues. Rework the code to pull the pup path out from the
      mmap_sem critical section as suggested by Jason.
      
      By relying entirely on the vma checks in pin_user_pages and follow_pfn
      (for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
      
      Note that pin_user_pages_fast is a safe replacement despite the
      seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such
      ptes are marked with pte_mkspecial (which pup_fast rejects in the
      fastpath), and only architectures supporting that support the
      pin_user_pages_fast fastpath.
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Tomasz Figa <tfiga@chromium.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-samsung-soc@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-6-daniel.vetter@ffwll.ch
      04769cb1
  14. 30 12月, 2020 2 次提交
    • B
      mm: memmap defer init doesn't work as expected · dc2da7b4
      Baoquan He 提交于
      VMware observed a performance regression during memmap init on their
      platform, and bisected to commit 73a6e474 ("mm: memmap_init:
      iterate over memblock regions rather that check each PFN") causing it.
      
      Before the commit:
      
        [0.033176] Normal zone: 1445888 pages used for memmap
        [0.033176] Normal zone: 89391104 pages, LIFO batch:63
        [0.035851] ACPI: PM-Timer IO Port: 0x448
      
      With commit
      
        [0.026874] Normal zone: 1445888 pages used for memmap
        [0.026875] Normal zone: 89391104 pages, LIFO batch:63
        [2.028450] ACPI: PM-Timer IO Port: 0x448
      
      The root cause is the current memmap defer init doesn't work as expected.
      
      Before, memmap_init_zone() was used to do memmap init of one whole zone,
      to initialize all low zones of one numa node, but defer memmap init of
      the last zone in that numa node.  However, since commit 73a6e474,
      function memmap_init() is adapted to iterater over memblock regions
      inside one zone, then call memmap_init_zone() to do memmap init for each
      region.
      
      E.g, on VMware's system, the memory layout is as below, there are two
      memory regions in node 2.  The current code will mistakenly initialize the
      whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
      iniatialize only one memmory section on the 2nd region [mem
      0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
      only one memory section's memmap initialized.  That's why more time is
      costed at the time.
      
      [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
      [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
      [    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
      [    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
      [    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
      [    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
      
      Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
      down the real zone end pfn so that defer_init() can use it to judge
      whether defer need be taken in zone wide.
      
      Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
      Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
      Fixes: commit 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Reported-by: NRahul Gopakumar <gopakumarr@vmware.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc2da7b4
    • S
      mm: add prototype for __add_to_page_cache_locked() · 6d87d0ec
      Souptick Joarder 提交于
      Otherwise it causes a gcc warning:
      
        mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]
      
      A previous attempt to make this function static led to compilation
      errors when CONFIG_DEBUG_INFO_BTF is enabled because
      __add_to_page_cache_locked() is referred to by BPF code.
      
      Adding a prototype will silence the warning.
      
      Link: https://lkml.kernel.org/r/1608693702-4665-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d87d0ec
  15. 23 12月, 2020 2 次提交
  16. 16 12月, 2020 8 次提交
    • C
      mm: simplify follow_pte{,pmd} · ff5c19ed
      Christoph Hellwig 提交于
      Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single
      follow_pte function and just pass two additional NULL arguments for the
      two previous follow_pte callers.
      
      [sfr@canb.auug.org.au: merge fix for "s390/pci: remove races against pte updates"]
        Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au
      
      Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff5c19ed
    • V
      kernel/power: allow hibernation with page_poison sanity checking · 03b6c9a3
      Vlastimil Babka 提交于
      Page poisoning used to be incompatible with hibernation, as the state of
      poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
      forces CONFIG_PAGE_POISONING_NO_SANITY.  For the same reason, the
      poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
      hibernation.  The latter restriction was removed by commit 1ad1410f
      ("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
      similarly for init_on_free by commit 18451f9f ("PM: hibernate: fix
      crashes with init_on_free=1") by making sure free pages are cleared after
      resume.
      
      We can use the same mechanism to instead poison free pages with
      PAGE_POISON after resume.  This covers both zero and 0xAA patterns.  Thus
      we can remove the Kconfig restriction that disables page poison sanity
      checking when hibernation is enabled.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[hibernation]
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <labbott@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03b6c9a3
    • V
      mm, page_poison: use static key more efficiently · 8db26a3d
      Vlastimil Babka 提交于
      Commit 11c9c7ed ("mm/page_poison.c: replace bool variable with static
      key") changed page_poisoning_enabled() to a static key check.  However,
      the function is not inlined, so each check still involves a function call
      with overhead not eliminated when page poisoning is disabled.
      
      Analogically to how debug_pagealloc is handled, this patch converts
      page_poisoning_enabled() back to boolean check, and introduces
      page_poisoning_enabled_static() for fast paths.  Both functions are
      inlined.
      
      The function kernel_poison_pages() is also called unconditionally and does
      the static key check inside.  Remove it from there and put it to callers.
      Also split it to two functions kernel_poison_pages() and
      kernel_unpoison_pages() instead of the confusing bool parameter.
      
      Also optimize the check that enables page poisoning instead of
      debug_pagealloc for architectures without proper debug_pagealloc support.
      Move the check to init_mem_debugging_and_hardening() to enable a single
      static key instead of having two static branches in
      page_poisoning_enabled_static().
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <labbott@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8db26a3d
    • V
      mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters · 04013513
      Vlastimil Babka 提交于
      Patch series "cleanup page poisoning", v3.
      
      I have identified a number of issues and opportunities for cleanup with
      CONFIG_PAGE_POISON and friends:
      
       - interaction with init_on_alloc and init_on_free parameters depends on
         the order of parameters (Patch 1)
      
       - the boot time enabling uses static key, but inefficienty (Patch 2)
      
       - sanity checking is incompatible with hibernation (Patch 3)
      
       - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
         init_on_free (Patch 4)
      
       - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
         have init_on_free (Patch 5)
      
      This patch (of 5):
      
      Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
      produces a warning in dmesg that page_poison takes precedence.  However,
      as these warnings are printed in early_param handlers for
      init_on_alloc/free, they are not printed if page_poison is enabled later
      on the command line (handlers are called in the order of their
      parameters), or when init_on_alloc/free is always enabled by the
      respective config option - before the page_poison early param handler is
      called, it is not considered to be enabled.  This is inconsistent.
      
      We can remove the dependency on order by making the init_on_* parameters
      only set a boolean variable, and postponing the evaluation after all early
      params have been processed.  Introduce a new
      init_mem_debugging_and_hardening() function for that, and move the related
      debug_pagealloc processing there as well.
      
      As a result init_mem_debugging_and_hardening() knows always accurately if
      init_on_* and/or page_poison options were enabled.  Thus we can also
      optimize want_init_on_alloc() and want_init_on_free().  We don't need to
      check page_poisoning_enabled() there, we can instead not enable the
      init_on_* static keys at all, if page poisoning is enabled.  This results
      in a simpler and more effective code.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
      Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Laura Abbott <labbott@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04013513
    • M
      arch, mm: make kernel_page_present() always available · 32a0de88
      Mike Rapoport 提交于
      For architectures that enable ARCH_HAS_SET_MEMORY having the ability to
      verify that a page is mapped in the kernel direct map can be useful
      regardless of hibernation.
      
      Add RISC-V implementation of kernel_page_present(), update its forward
      declarations and stubs to be a part of set_memory API and remove ugly
      ifdefery in inlcude/linux/mm.h around current declarations of
      kernel_page_present().
      
      Link: https://lkml.kernel.org/r/20201109192128.960-5-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32a0de88
    • M
      arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC · 5d6ad668
      Mike Rapoport 提交于
      The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
      never fail.  With this assumption is wouldn't be safe to allow general
      usage of this function.
      
      Moreover, some architectures that implement __kernel_map_pages() have this
      function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
      pages when page allocation debugging is disabled at runtime.
      
      As all the users of __kernel_map_pages() were converted to use
      debug_pagealloc_map_pages() it is safe to make it available only when
      DEBUG_PAGEALLOC is set.
      
      Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6ad668
    • M
      PM: hibernate: make direct map manipulations more explicit · 2abf962a
      Mike Rapoport 提交于
      When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be
      not present in the direct map and has to be explicitly mapped before it
      could be copied.
      
      Introduce hibernate_map_page() and hibernation_unmap_page() that will
      explicitly use set_direct_map_{default,invalid}_noflush() for
      ARCH_HAS_SET_DIRECT_MAP case and debug_pagealloc_{map,unmap}_pages() for
      DEBUG_PAGEALLOC case.
      
      The remapping of the pages in safe_copy_page() presumes that it only
      changes protection bits in an existing PTE and so it is safe to ignore
      return value of set_direct_map_{default,invalid}_noflush().
      
      Still, add a pr_warn() so that future changes in set_memory APIs will not
      silently break hibernation.
      
      Link: https://lkml.kernel.org/r/20201109192128.960-3-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2abf962a
    • M
      mm: introduce debug_pagealloc_{map,unmap}_pages() helpers · 77bc7fd6
      Mike Rapoport 提交于
      Patch series "arch, mm: improve robustness of direct map manipulation", v7.
      
      During recent discussion about KVM protected memory, David raised a
      concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC
      scope [1].
      
      Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
      possible that __kernel_map_pages() would fail, but since this function is
      void, the failure will go unnoticed.
      
      Moreover, there's lack of consistency of __kernel_map_pages() semantics
      across architectures as some guard this function with #ifdef
      DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation
      debugging is disabled at run time and some allow modifying the direct map
      regardless of DEBUG_PAGEALLOC settings.
      
      This set straightens this out by restoring dependency of
      __kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
      accordingly.
      
      Since currently the only user of __kernel_map_pages() outside
      DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
      there more explicit.
      
      [1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com
      
      This patch (of 4):
      
      When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
      direct mapping after free_pages().  The pages than need to be mapped back
      before they could be used.  Theese mapping operations use
      __kernel_map_pages() guarded with with debug_pagealloc_enabled().
      
      The only place that calls __kernel_map_pages() without checking whether
      DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
      availability of this function when ARCH_HAS_SET_DIRECT_MAP is set.  Still,
      on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not
      enabled but set_direct_map_invalid_noflush() may render some pages not
      present in the direct map and hibernation code won't be able to save such
      pages.
      
      To make page allocation debugging and hibernation interaction more robust,
      the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be
      made more explicit.
      
      Start with combining the guard condition and the call to
      __kernel_map_pages() into debug_pagealloc_map_pages() and
      debug_pagealloc_unmap_pages() functions to emphasize that
      __kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use
      these new functions to map/unmap pages when page allocation debugging is
      enabled.
      
      Link: https://lkml.kernel.org/r/20201109192128.960-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20201109192128.960-2-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77bc7fd6