1. 14 3月, 2021 8 次提交
    • P
      mm: introduce page_needs_cow_for_dma() for deciding whether cow · 97a7e473
      Peter Xu 提交于
      We've got quite a few places (pte, pmd, pud) that explicitly checked
      against whether we should break the cow right now during fork().  It's
      easier to provide a helper, especially before we work the same thing on
      hugetlbfs.
      
      Since we'll reference is_cow_mapping() in mm.h, move it there too.
      Actually it suites mm.h more since internal.h is mm/ only, but mm.h is
      exported to the whole kernel.  With that we should expect another patch to
      use is_cow_mapping() whenever we can across the kernel since we do use it
      quite a lot but it's always done with raw code against VM_* flags.
      
      Link: https://lkml.kernel.org/r/20210217233547.93892-4-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NJason Gunthorpe <jgg@ziepe.ca>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Gal Pressman <galpress@amazon.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill Shutemov <kirill@shutemov.name>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Roland Scheidegger <sroland@vmware.com>
      Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
      Cc: Wei Zhang <wzam@amazon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97a7e473
    • P
      hugetlb: break earlier in add_reservation_in_range() when we can · ca7e0457
      Peter Xu 提交于
      All the regions maintained in hugetlb reserved map is inclusive on "from"
      but exclusive on "to".  We can break earlier even if rg->from==t because
      it already means no possible intersection.
      
      This does not need a Fixes in all cases because when it happens
      (rg->from==t) we'll not break out of the loop while we should, however the
      next thing we'd do is still add the last file_region we'd need and quit
      the loop in the next round.  So this change is not a bugfix (since the old
      code should still run okay iiuc), but we'd better still touch it up to
      make it logically sane.
      
      Link: https://lkml.kernel.org/r/20210217233547.93892-3-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Gal Pressman <galpress@amazon.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill Shutemov <kirill@shutemov.name>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Roland Scheidegger <sroland@vmware.com>
      Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
      Cc: Wei Zhang <wzam@amazon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca7e0457
    • P
      hugetlb: dedup the code to add a new file_region · 2103cf9c
      Peter Xu 提交于
      Patch series "mm/hugetlb: Early cow on fork, and a few cleanups", v5.
      
      As reported by Gal [1], we still miss the code clip to handle early cow
      for hugetlb case, which is true.  Again, it still feels odd to fork()
      after using a few huge pages, especially if they're privately mapped to
      me..  However I do agree with Gal and Jason in that we should still have
      that since that'll complete the early cow on fork effort at least, and
      it'll still fix issues where buffers are not well under control and not
      easy to apply MADV_DONTFORK.
      
      The first two patches (1-2) are some cleanups I noticed when reading into
      the hugetlb reserve map code.  I think it's good to have but they're not
      necessary for fixing the fork issue.
      
      The last two patches (3-4) are the real fix.
      
      I tested this with a fork() after some vfio-pci assignment, so I'm pretty
      sure the page copy path could trigger well (page will be accounted right
      after the fork()), but I didn't do data check since the card I assigned is
      some random nic.
      
        https://github.com/xzpeter/linux/tree/fork-cow-pin-huge
      
      [1] https://lore.kernel.org/lkml/27564187-4a08-f187-5a84-3df50009f6ca@amazon.com/
      
      Introduce hugetlb_resv_map_add() helper to add a new file_region rather
      than duplication the similar code twice in add_reservation_in_range().
      
      Link: https://lkml.kernel.org/r/20210217233547.93892-1-peterx@redhat.com
      Link: https://lkml.kernel.org/r/20210217233547.93892-2-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Cc: Gal Pressman <galpress@amazon.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Wei Zhang <wzam@amazon.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Kirill Shutemov <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Roland Scheidegger <sroland@vmware.com>
      Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2103cf9c
    • F
      mm/fork: clear PASID for new mm · 82e69a12
      Fenghua Yu 提交于
      When a new mm is created, its PASID should be cleared, i.e.  the PASID is
      initialized to its init state 0 on both ARM and X86.
      
      This patch was part of the series introducing mm->pasid, but got lost
      along the way [1].  It still makes sense to have it, because each address
      space has a different PASID.  And the IOMMU code in
      iommu_sva_alloc_pasid() expects the pasid field of a new mm struct to be
      cleared.
      
      [1] https://lore.kernel.org/linux-iommu/YDgh53AcQHT+T3L0@otcwcpicx3.sc.intel.com/
      
      Link: https://lkml.kernel.org/r/20210302103837.2562625-1-jean-philippe@linaro.orgSigned-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82e69a12
    • M
      mm/page_alloc.c: refactor initialization of struct page for holes in memory layout · 0740a50b
      Mike Rapoport 提交于
      There could be struct pages that are not backed by actual physical memory.
      This can happen when the actual memory bank is not a multiple of
      SECTION_SIZE or when an architecture does not register memory holes
      reserved by the firmware as memblock.memory.
      
      Such pages are currently initialized using init_unavailable_mem() function
      that iterates through PFNs in holes in memblock.memory and if there is a
      struct page corresponding to a PFN, the fields of this page are set to
      default values and it is marked as Reserved.
      
      init_unavailable_mem() does not take into account zone and node the page
      belongs to and sets both zone and node links in struct page to zero.
      
      Before commit 73a6e474 ("mm: memmap_init: iterate over memblock
      regions rather that check each PFN") the holes inside a zone were
      re-initialized during memmap_init() and got their zone/node links right.
      However, after that commit nothing updates the struct pages representing
      such holes.
      
      On a system that has firmware reserved holes in a zone above ZONE_DMA, for
      instance in a configuration below:
      
      	# grep -A1 E820 /proc/iomem
      	7a17b000-7a216fff : Unknown E820 type
      	7a217000-7bffffff : System RAM
      
      unset zone link in struct page will trigger
      
      	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
      
      in set_pfnblock_flags_mask() when called with a struct page from a range
      other than E820_TYPE_RAM because there are pages in the range of
      ZONE_DMA32 but the unset zone link in struct page makes them appear as a
      part of ZONE_DMA.
      
      Interleave initialization of the unavailable pages with the normal
      initialization of memory map, so that zone and node information will be
      properly set on struct pages that are not backed by the actual memory.
      
      With this change the pages for holes inside a zone will get proper
      zone/node links and the pages that are not spanned by any node will get
      links to the adjacent zone/node.  The holes between nodes will be
      prepended to the zone/node above the hole and the trailing pages in the
      last section that will be appended to the zone/node below.
      
      [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64]
      
      Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org
      Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Łukasz Majczak <lma@semihalf.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0740a50b
    • M
      init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM · ea29b20a
      Masahiro Yamada 提交于
      I read the commit log of the following two:
      
      - bc083a64 ("init/Kconfig: make COMPILE_TEST depend on !UML")
      - 334ef6ed ("init/Kconfig: make COMPILE_TEST depend on !S390")
      
      Both are talking about HAS_IOMEM dependency missing in many drivers.
      
      So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me.
      
      This does not change the behavior of UML. UML still cannot enable
      COMPILE_TEST because it does not provide HAS_IOMEM.
      
      The current dependency for S390 is too strong. Under the condition of
      CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST.
      
      I also removed the meaningless 'default n'.
      
      Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.orgSigned-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Arnd Bergmann <arnd@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KP Singh <kpsingh@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Quentin Perret <qperret@google.com>
      Cc: Valentin Schneider <valentin.schneider@arm.com>
      Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea29b20a
    • A
      stop_machine: mark helpers __always_inline · cbf78d85
      Arnd Bergmann 提交于
      With clang-13, some functions only get partially inlined, with a
      specialized version referring to a global variable.  This triggers a
      harmless build-time check for the intel-rng driver:
      
      WARNING: modpost: drivers/char/hw_random/intel-rng.o(.text+0xe): Section mismatch in reference from the function stop_machine() to the function .init.text:intel_rng_hw_init()
      The function stop_machine() references
      the function __init intel_rng_hw_init().
      This is often because stop_machine lacks a __init
      annotation or the annotation of intel_rng_hw_init is wrong.
      
      In this instance, an easy workaround is to force the stop_machine()
      function to be inline, along with related interfaces that did not show the
      same behavior at the moment, but theoretically could.
      
      The combination of the two patches listed below triggers the behavior in
      clang-13, but individually these commits are correct.
      
      Link: https://lkml.kernel.org/r/20210225130153.1956990-1-arnd@kernel.org
      Fixes: fe5595c0 ("stop_machine: Provide stop_machine_cpuslocked()")
      Fixes: ee527cd3 ("Use stop_machine_run in the Intel RNG driver")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Valentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbf78d85
    • A
      memblock: fix section mismatch warning · 34dc2efb
      Arnd Bergmann 提交于
      The inlining logic in clang-13 is rewritten to often not inline some
      functions that were inlined by all earlier compilers.
      
      In case of the memblock interfaces, this exposed a harmless bug of a
      missing __init annotation:
      
      WARNING: modpost: vmlinux.o(.text+0x507c0a): Section mismatch in reference from the function memblock_bottom_up() to the variable .meminit.data:memblock
      The function memblock_bottom_up() references
      the variable __meminitdata memblock.
      This is often because memblock_bottom_up lacks a __meminitdata
      annotation or the annotation of memblock is wrong.
      
      Interestingly, these annotations were present originally, but got removed
      with the explanation that the __init annotation prevents the function from
      getting inlined.  I checked this again and found that while this is the
      case with clang, gcc (version 7 through 10, did not test others) does
      inline the functions regardless.
      
      As the previous change was apparently intended to help the clang builds,
      reverting it to help the newer clang versions seems appropriate as well.
      gcc builds don't seem to care either way.
      
      Link: https://lkml.kernel.org/r/20210225133808.2188581-1-arnd@kernel.org
      Fixes: 5bdba520 ("mm: memblock: drop __init from memblock functions to make it inline")
      Reference: 2cfb3665 ("include/linux/memblock.h: add __init to memblock_set_bottom_up()")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Aslan Bakirov <aslan@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34dc2efb
  2. 12 3月, 2021 6 次提交
  3. 11 3月, 2021 26 次提交