1. 02 11月, 2021 2 次提交
    • Z
      mips: fix HUGETLB function without THP enabled · f69fa4c8
      Zhaolong Zhang 提交于
      ltp test futex_wake04 without THP enabled leads to below bt:
        [<ffffffff80a03728>] BUG+0x0/0x8
        [<ffffffff80a0624c>] internal_get_user_pages_fast+0x81c/0x820
        [<ffffffff8093ac18>] get_futex_key+0xa0/0x480
        [<ffffffff8093b074>] futex_wait_setup+0x7c/0x1a8
        [<ffffffff8093b2c0>] futex_wait+0x120/0x228
        [<ffffffff8093dbe8>] do_futex+0x140/0xbd8
        [<ffffffff8093e78c>] sys_futex+0x10c/0x1c0
        [<ffffffff808703d0>] syscall_common+0x34/0x58
      
      Move pmd_write() and pmd_page() from TRANSPARENT_HUGEPAGE scope to
      MIPS_HUGE_TLB_SUPPORT scope, because both THP and HUGETLB will need
      them.
      Signed-off-by: NZhaolong Zhang <zhangzl2013@126.com>
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      f69fa4c8
    • G
      mips: cm: Convert to bitfield API to fix out-of-bounds access · 18b8f5b6
      Geert Uytterhoeven 提交于
      mips_cm_error_report() extracts the cause and other cause from the error
      register using shifts.  This works fine for the former, as it is stored
      in the top bits, and the shift will thus remove all non-related bits.
      However, the latter is stored in the bottom bits, hence thus needs masking
      to get rid of non-related bits.  Without such masking, using it as an
      index into the cm2_causes[] array will lead to an out-of-bounds access,
      probably causing a crash.
      
      Fix this by using FIELD_GET() instead.  Bite the bullet and convert all
      MIPS CM handling to the bitfield API, to improve readability and safety.
      
      Fixes: 3885c2b4 ("MIPS: CM: Add support for reporting CM cache errors")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      18b8f5b6
  2. 24 10月, 2021 3 次提交
    • T
      MIPS: Remove NETLOGIC support · 95b8a5e0
      Thomas Bogendoerfer 提交于
      No (active) developer owns this hardware, so let's remove Linux support.
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      95b8a5e0
    • M
      MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL · a923a267
      Maciej W. Rozycki 提交于
      Fix assembly errors like:
      
      {standard input}: Assembler messages:
      {standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
      {standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
      {standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32'
      {standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
      make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1
      
      with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations
      using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT.
      
      This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture
      to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a
      block of code containing a DINS MIPS64r2 instruction conditionalized on
      MIPS_ISA_REV >= 2 within the scope of the downgrade.
      
      The assembly architecture override code pattern has been put there for
      LL/SC instructions, so that code compiles for configurations that select
      a processor to build for that does not support these instructions while
      still providing run-time support for processors that do, dynamically
      switched by non-constant `cpu_has_llsc'.  It went in with linux-mips.org
      commit aac8aa77 ("Enable a suitable ISA for the assembler around
      ll/sc so that code builds even for processors that don't support the
      instructions. Plus minor formatting fixes.") back in 2005.
      
      Fix the problem by wrapping these instructions along with the adjacent
      SYNC instructions only, following the practice established with commit
      cfd54de3 ("MIPS: Avoid move psuedo-instruction whilst using
      MIPS_ISA_LEVEL") and commit 378ed6f0 ("MIPS: Avoid using .set mips0
      to restore ISA").  Strictly speaking the SYNC instructions do not have
      to be wrapped as they are only used as a Loongson3 erratum workaround,
      so they will be enabled in the assembler by default, but do this so as
      to keep code consistent with other places.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Fixes: c7e2d71d ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      a923a267
    • T
      MIPS: octeon: Remove unused functions · d6c7c374
      Thomas Bogendoerfer 提交于
      cvmx_helper_initialize_packet_io_local() is unused and after removing
      it cvmx_pko_initialize_local() is also unused.
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      d6c7c374
  3. 18 10月, 2021 1 次提交
  4. 15 10月, 2021 1 次提交
  5. 09 10月, 2021 1 次提交
  6. 07 10月, 2021 2 次提交
  7. 05 10月, 2021 3 次提交
  8. 02 10月, 2021 1 次提交
  9. 01 10月, 2021 1 次提交
  10. 23 9月, 2021 1 次提交
  11. 21 9月, 2021 1 次提交
  12. 15 9月, 2021 1 次提交
  13. 09 9月, 2021 1 次提交
    • A
      arch: remove compat_alloc_user_space · a7a08b27
      Arnd Bergmann 提交于
      All users of compat_alloc_user_space() and copy_in_user() have been
      removed from the kernel, only a few functions in sparc remain that can be
      changed to calling arch_copy_in_user() instead.
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7a08b27
  14. 04 9月, 2021 1 次提交
    • C
      mm: remove flush_kernel_dcache_page · f358afc5
      Christoph Hellwig 提交于
      flush_kernel_dcache_page is a rather confusing interface that implements a
      subset of flush_dcache_page by not being able to properly handle page
      cache mapped pages.
      
      The only callers left are in the exec code as all other previous callers
      were incorrect as they could have dealt with page cache pages.  Replace
      the calls to flush_kernel_dcache_page with calls to flush_dcache_page,
      which for all architectures does either exactly the same thing, can
      contains one or more of the following:
      
       1) an optimization to defer the cache flush for page cache pages not
          mapped into userspace
       2) additional flushing for mapped page cache pages if cache aliases
          are possible
      
      Link: https://lkml.kernel.org/r/20210712060928.4161649-7-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f358afc5
  15. 05 8月, 2021 2 次提交
  16. 04 8月, 2021 1 次提交
    • P
      sock: allow reading and changing sk_userlocks with setsockopt · 04190bf8
      Pavel Tikhomirov 提交于
      SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
      buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
      tcp_sndbuf_expand()). If we've just created a new socket this adjustment
      is enabled on it, but if one changes the socket buffer size by
      setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
      
      CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
      restore as it first needs to increase buffer sizes for packet queues
      restore and second it needs to restore back original buffer sizes. So
      after CRIU restore all sockets become non-auto-adjustable, which can
      decrease network performance of restored applications significantly.
      
      CRIU need to be able to restore sockets with enabled/disabled adjustment
      to the same state it was before dump, so let's add special setsockopt
      for it.
      
      Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so
      that using these interface one can reenable automatic socket buffer
      adjustment on their sockets.
      Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04190bf8
  17. 03 8月, 2021 1 次提交
  18. 24 7月, 2021 1 次提交
  19. 23 7月, 2021 2 次提交
  20. 19 7月, 2021 1 次提交
  21. 14 7月, 2021 1 次提交
  22. 09 7月, 2021 4 次提交
  23. 02 7月, 2021 2 次提交
    • A
      mm/thp: define default pmd_pgtable() · 1c2f7d14
      Anshuman Khandual 提交于
      Currently most platforms define pmd_pgtable() as pmd_page() duplicating
      the same code all over.  Instead just define a default value i.e
      pmd_page() for pmd_pgtable() and let platforms override when required via
      <asm/pgtable.h>.  All the existing platform that override pmd_pgtable()
      have been moved into their respective <asm/pgtable.h> header in order to
      precede before the new generic definition.  This makes it much cleaner
      with reduced code.
      
      Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c2f7d14
    • A
      mm: define default value for FIRST_USER_ADDRESS · fac7757e
      Anshuman Khandual 提交于
      Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the
      same code all over.  Instead just define a generic default value (i.e 0UL)
      for FIRST_USER_ADDRESS and let the platforms override when required.  This
      makes it much cleaner with reduced code.
      
      The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h>
      when the given platform overrides its value via <asm/pgtable.h>.
      
      Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Guo Ren <guoren@kernel.org>			[csky]
      Acked-by: Stafford Horne <shorne@gmail.com>		[openrisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[RISC-V]
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fac7757e
  24. 01 7月, 2021 1 次提交
    • D
      mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables · 4ca9b385
      David Hildenbrand 提交于
      I. Background: Sparse Memory Mappings
      
      When we manage sparse memory mappings dynamically in user space - also
      sometimes involving MAP_NORESERVE - we want to dynamically populate/
      discard memory inside such a sparse memory region.  Example users are
      hypervisors (especially implementing memory ballooning or similar
      technologies like virtio-mem) and memory allocators.  In addition, we want
      to fail in a nice way (instead of generating SIGBUS) if populating does
      not succeed because we are out of backend memory (which can happen easily
      with file-based mappings, especially tmpfs and hugetlbfs).
      
      While MADV_DONTNEED, MADV_REMOVE and FALLOC_FL_PUNCH_HOLE allow for
      reliably discarding memory for most mapping types, there is no generic
      approach to populate page tables and preallocate memory.
      
      Although mmap() supports MAP_POPULATE, it is not applicable to the concept
      of sparse memory mappings, where we want to populate/discard dynamically
      and avoid expensive/problematic remappings.  In addition, we never
      actually report errors during the final populate phase - it is best-effort
      only.
      
      fallocate() can be used to preallocate file-based memory and fail in a
      safe way.  However, it cannot really be used for any private mappings on
      anonymous files via memfd due to COW semantics.  In addition, fallocate()
      does not actually populate page tables, so we still always get pagefaults
      on first access - which is sometimes undesired (i.e., real-time workloads)
      and requires real prefaulting of page tables, not just a preallocation of
      backend storage.  There might be interesting use cases for sparse memory
      regions along with mlockall(MCL_ONFAULT) which fallocate() cannot satisfy
      as it does not prefault page tables.
      
      II. On preallcoation/prefaulting from user space
      
      Because we don't have a proper interface, what applications (like QEMU and
      databases) end up doing is touching (i.e., reading+writing one byte to not
      overwrite existing data) all individual pages.
      
      However, that approach
      1) Can result in wear on storage backing, because we end up reading/writing
         each page; this is especially a problem for dax/pmem.
      2) Can result in mmap_sem contention when prefaulting via multiple
         threads.
      3) Requires expensive signal handling, especially to catch SIGBUS in case
         of hugetlbfs/shmem/file-backed memory. For example, this is
         problematic in hypervisors like QEMU where SIGBUS handlers might already
         be used by other subsystems concurrently to e.g, handle hardware errors.
         "Simply" doing preallocation concurrently from other thread is not that
         easy.
      
      III. On MADV_WILLNEED
      
      Extending MADV_WILLNEED is not an option because
      1. It would change the semantics: "Expect access in the near future." and
         "might be a good idea to read some pages" vs. "Definitely populate/
         preallocate all memory and definitely fail on errors.".
      2. Existing users (like virtio-balloon in QEMU when deflating the balloon)
         don't want populate/prealloc semantics. They treat this rather as a hint
         to give a little performance boost without too much overhead - and don't
         expect that a lot of memory might get consumed or a lot of time
         might be spent.
      
      IV. MADV_POPULATE_READ and MADV_POPULATE_WRITE
      
      Let's introduce MADV_POPULATE_READ and MADV_POPULATE_WRITE, inspired by
      MAP_POPULATE, with the following semantics:
      1. MADV_POPULATE_READ can be used to prefault page tables just like
         manually reading each individual page. This will not break any COW
         mappings. The shared zero page might get mapped and no backend storage
         might get preallocated -- allocation might be deferred to
         write-fault time. Especially shared file mappings require an explicit
         fallocate() upfront to actually preallocate backend memory (blocks in
         the file system) in case the file might have holes.
      2. If MADV_POPULATE_READ succeeds, all page tables have been populated
         (prefaulted) readable once.
      3. MADV_POPULATE_WRITE can be used to preallocate backend memory and
         prefault page tables just like manually writing (or
         reading+writing) each individual page. This will break any COW
         mappings -- e.g., the shared zeropage is never populated.
      4. If MADV_POPULATE_WRITE succeeds, all page tables have been populated
         (prefaulted) writable once.
      5. MADV_POPULATE_READ and MADV_POPULATE_WRITE cannot be applied to special
         mappings marked with VM_PFNMAP and VM_IO. Also, proper access
         permissions (e.g., PROT_READ, PROT_WRITE) are required. If any such
         mapping is encountered, madvise() fails with -EINVAL.
      6. If MADV_POPULATE_READ or MADV_POPULATE_WRITE fails, some page tables
         might have been populated.
      7. MADV_POPULATE_READ and MADV_POPULATE_WRITE will return -EHWPOISON
         when encountering a HW poisoned page in the range.
      8. Similar to MAP_POPULATE, MADV_POPULATE_READ and MADV_POPULATE_WRITE
         cannot protect from the OOM (Out Of Memory) handler killing the
         process.
      
      While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e.,
      preallocate memory and prefault page tables for VMs), one issue is that
      whenever we prefault pages writable, the pages have to be marked dirty,
      because the CPU could dirty them any time.  while not a real problem for
      hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each
      page will be marked dirty and has to be written back later when evicting.
      
      MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole
      mapping from backend storage without marking it dirty, such that eviction
      won't have to write it back.  As discussed above, shared file mappings
      might require an explciit fallocate() upfront to achieve
      preallcoation+prepopulation.
      
      Although sparse memory mappings are the primary use case, this will also
      be useful for other preallocate/prefault use cases where MAP_POPULATE is
      not desired or the semantics of MAP_POPULATE are not sufficient: as one
      example, QEMU users can trigger preallocation/prefaulting of guest RAM
      after the mapping was created -- and don't want errors to be silently
      suppressed.
      
      Looking at the history, MADV_POPULATE was already proposed in 2013 [1],
      however, the main motivation back than was performance improvements --
      which should also still be the case.
      
      V. Single-threaded performance comparison
      
      I did a short experiment, prefaulting page tables on completely *empty
      mappings/files* and repeated the experiment 10 times.  The results
      correspond to the shortest execution time.  In general, the performance
      benefit for huge pages is negligible with small mappings.
      
      V.1: Private mappings
      
      POPULATE_READ and POPULATE_WRITE is fastest.  Note that
      Reading/POPULATE_READ will populate the shared zeropage where applicable
      -- which result in short population times.
      
      The fastest way to allocate backend storage (here: swap or huge pages) and
      prefault page tables is POPULATE_WRITE.
      
      V.2: Shared mappings
      
      fallocate() is fastest, however, doesn't prefault page tables.
      POPULATE_WRITE is faster than simple writes and read/writes.
      POPULATE_READ is faster than simple reads.
      
      Without a fd, the fastest way to allocate backend storage and prefault
      page tables is POPULATE_WRITE.  With an fd, the fastest way is usually
      FALLOCATE+POPULATE_READ or FALLOCATE+POPULATE_WRITE respectively; one
      exception are actual files: FALLOCATE+Read is slightly faster than
      FALLOCATE+POPULATE_READ.
      
      The fastest way to allocate backend storage prefault page tables is
      FALLOCATE+POPULATE_WRITE -- except when dealing with actual files; then,
      FALLOCATE+POPULATE_READ is fastest and won't directly mark all pages as
      dirty.
      
      v.3: Detailed results
      
      ==================================================
      2 MiB MAP_PRIVATE:
      **************************************************
      Anon 4 KiB     : Read                     :     0.119 ms
      Anon 4 KiB     : Write                    :     0.222 ms
      Anon 4 KiB     : Read/Write               :     0.380 ms
      Anon 4 KiB     : POPULATE_READ            :     0.060 ms
      Anon 4 KiB     : POPULATE_WRITE           :     0.158 ms
      Memfd 4 KiB    : Read                     :     0.034 ms
      Memfd 4 KiB    : Write                    :     0.310 ms
      Memfd 4 KiB    : Read/Write               :     0.362 ms
      Memfd 4 KiB    : POPULATE_READ            :     0.039 ms
      Memfd 4 KiB    : POPULATE_WRITE           :     0.229 ms
      Memfd 2 MiB    : Read                     :     0.030 ms
      Memfd 2 MiB    : Write                    :     0.030 ms
      Memfd 2 MiB    : Read/Write               :     0.030 ms
      Memfd 2 MiB    : POPULATE_READ            :     0.030 ms
      Memfd 2 MiB    : POPULATE_WRITE           :     0.030 ms
      tmpfs          : Read                     :     0.033 ms
      tmpfs          : Write                    :     0.313 ms
      tmpfs          : Read/Write               :     0.406 ms
      tmpfs          : POPULATE_READ            :     0.039 ms
      tmpfs          : POPULATE_WRITE           :     0.285 ms
      file           : Read                     :     0.033 ms
      file           : Write                    :     0.351 ms
      file           : Read/Write               :     0.408 ms
      file           : POPULATE_READ            :     0.039 ms
      file           : POPULATE_WRITE           :     0.290 ms
      hugetlbfs      : Read                     :     0.030 ms
      hugetlbfs      : Write                    :     0.030 ms
      hugetlbfs      : Read/Write               :     0.030 ms
      hugetlbfs      : POPULATE_READ            :     0.030 ms
      hugetlbfs      : POPULATE_WRITE           :     0.030 ms
      **************************************************
      4096 MiB MAP_PRIVATE:
      **************************************************
      Anon 4 KiB     : Read                     :   237.940 ms
      Anon 4 KiB     : Write                    :   708.409 ms
      Anon 4 KiB     : Read/Write               :  1054.041 ms
      Anon 4 KiB     : POPULATE_READ            :   124.310 ms
      Anon 4 KiB     : POPULATE_WRITE           :   572.582 ms
      Memfd 4 KiB    : Read                     :   136.928 ms
      Memfd 4 KiB    : Write                    :   963.898 ms
      Memfd 4 KiB    : Read/Write               :  1106.561 ms
      Memfd 4 KiB    : POPULATE_READ            :    78.450 ms
      Memfd 4 KiB    : POPULATE_WRITE           :   805.881 ms
      Memfd 2 MiB    : Read                     :   357.116 ms
      Memfd 2 MiB    : Write                    :   357.210 ms
      Memfd 2 MiB    : Read/Write               :   357.606 ms
      Memfd 2 MiB    : POPULATE_READ            :   356.094 ms
      Memfd 2 MiB    : POPULATE_WRITE           :   356.937 ms
      tmpfs          : Read                     :   137.536 ms
      tmpfs          : Write                    :   954.362 ms
      tmpfs          : Read/Write               :  1105.954 ms
      tmpfs          : POPULATE_READ            :    80.289 ms
      tmpfs          : POPULATE_WRITE           :   822.826 ms
      file           : Read                     :   137.874 ms
      file           : Write                    :   987.025 ms
      file           : Read/Write               :  1107.439 ms
      file           : POPULATE_READ            :    80.413 ms
      file           : POPULATE_WRITE           :   857.622 ms
      hugetlbfs      : Read                     :   355.607 ms
      hugetlbfs      : Write                    :   355.729 ms
      hugetlbfs      : Read/Write               :   356.127 ms
      hugetlbfs      : POPULATE_READ            :   354.585 ms
      hugetlbfs      : POPULATE_WRITE           :   355.138 ms
      **************************************************
      2 MiB MAP_SHARED:
      **************************************************
      Anon 4 KiB     : Read                     :     0.394 ms
      Anon 4 KiB     : Write                    :     0.348 ms
      Anon 4 KiB     : Read/Write               :     0.400 ms
      Anon 4 KiB     : POPULATE_READ            :     0.326 ms
      Anon 4 KiB     : POPULATE_WRITE           :     0.273 ms
      Anon 2 MiB     : Read                     :     0.030 ms
      Anon 2 MiB     : Write                    :     0.030 ms
      Anon 2 MiB     : Read/Write               :     0.030 ms
      Anon 2 MiB     : POPULATE_READ            :     0.030 ms
      Anon 2 MiB     : POPULATE_WRITE           :     0.030 ms
      Memfd 4 KiB    : Read                     :     0.412 ms
      Memfd 4 KiB    : Write                    :     0.372 ms
      Memfd 4 KiB    : Read/Write               :     0.419 ms
      Memfd 4 KiB    : POPULATE_READ            :     0.343 ms
      Memfd 4 KiB    : POPULATE_WRITE           :     0.288 ms
      Memfd 4 KiB    : FALLOCATE                :     0.137 ms
      Memfd 4 KiB    : FALLOCATE+Read           :     0.446 ms
      Memfd 4 KiB    : FALLOCATE+Write          :     0.330 ms
      Memfd 4 KiB    : FALLOCATE+Read/Write     :     0.454 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_READ  :     0.379 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_WRITE :     0.268 ms
      Memfd 2 MiB    : Read                     :     0.030 ms
      Memfd 2 MiB    : Write                    :     0.030 ms
      Memfd 2 MiB    : Read/Write               :     0.030 ms
      Memfd 2 MiB    : POPULATE_READ            :     0.030 ms
      Memfd 2 MiB    : POPULATE_WRITE           :     0.030 ms
      Memfd 2 MiB    : FALLOCATE                :     0.030 ms
      Memfd 2 MiB    : FALLOCATE+Read           :     0.031 ms
      Memfd 2 MiB    : FALLOCATE+Write          :     0.031 ms
      Memfd 2 MiB    : FALLOCATE+Read/Write     :     0.031 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_READ  :     0.030 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_WRITE :     0.030 ms
      tmpfs          : Read                     :     0.416 ms
      tmpfs          : Write                    :     0.369 ms
      tmpfs          : Read/Write               :     0.425 ms
      tmpfs          : POPULATE_READ            :     0.346 ms
      tmpfs          : POPULATE_WRITE           :     0.295 ms
      tmpfs          : FALLOCATE                :     0.139 ms
      tmpfs          : FALLOCATE+Read           :     0.447 ms
      tmpfs          : FALLOCATE+Write          :     0.333 ms
      tmpfs          : FALLOCATE+Read/Write     :     0.454 ms
      tmpfs          : FALLOCATE+POPULATE_READ  :     0.380 ms
      tmpfs          : FALLOCATE+POPULATE_WRITE :     0.272 ms
      file           : Read                     :     0.191 ms
      file           : Write                    :     0.511 ms
      file           : Read/Write               :     0.524 ms
      file           : POPULATE_READ            :     0.196 ms
      file           : POPULATE_WRITE           :     0.434 ms
      file           : FALLOCATE                :     0.004 ms
      file           : FALLOCATE+Read           :     0.197 ms
      file           : FALLOCATE+Write          :     0.554 ms
      file           : FALLOCATE+Read/Write     :     0.480 ms
      file           : FALLOCATE+POPULATE_READ  :     0.201 ms
      file           : FALLOCATE+POPULATE_WRITE :     0.381 ms
      hugetlbfs      : Read                     :     0.030 ms
      hugetlbfs      : Write                    :     0.030 ms
      hugetlbfs      : Read/Write               :     0.030 ms
      hugetlbfs      : POPULATE_READ            :     0.030 ms
      hugetlbfs      : POPULATE_WRITE           :     0.030 ms
      hugetlbfs      : FALLOCATE                :     0.030 ms
      hugetlbfs      : FALLOCATE+Read           :     0.031 ms
      hugetlbfs      : FALLOCATE+Write          :     0.031 ms
      hugetlbfs      : FALLOCATE+Read/Write     :     0.030 ms
      hugetlbfs      : FALLOCATE+POPULATE_READ  :     0.030 ms
      hugetlbfs      : FALLOCATE+POPULATE_WRITE :     0.030 ms
      **************************************************
      4096 MiB MAP_SHARED:
      **************************************************
      Anon 4 KiB     : Read                     :  1053.090 ms
      Anon 4 KiB     : Write                    :   913.642 ms
      Anon 4 KiB     : Read/Write               :  1060.350 ms
      Anon 4 KiB     : POPULATE_READ            :   893.691 ms
      Anon 4 KiB     : POPULATE_WRITE           :   782.885 ms
      Anon 2 MiB     : Read                     :   358.553 ms
      Anon 2 MiB     : Write                    :   358.419 ms
      Anon 2 MiB     : Read/Write               :   357.992 ms
      Anon 2 MiB     : POPULATE_READ            :   357.533 ms
      Anon 2 MiB     : POPULATE_WRITE           :   357.808 ms
      Memfd 4 KiB    : Read                     :  1078.144 ms
      Memfd 4 KiB    : Write                    :   942.036 ms
      Memfd 4 KiB    : Read/Write               :  1100.391 ms
      Memfd 4 KiB    : POPULATE_READ            :   925.829 ms
      Memfd 4 KiB    : POPULATE_WRITE           :   804.394 ms
      Memfd 4 KiB    : FALLOCATE                :   304.632 ms
      Memfd 4 KiB    : FALLOCATE+Read           :  1163.359 ms
      Memfd 4 KiB    : FALLOCATE+Write          :   933.186 ms
      Memfd 4 KiB    : FALLOCATE+Read/Write     :  1187.304 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_READ  :  1013.660 ms
      Memfd 4 KiB    : FALLOCATE+POPULATE_WRITE :   794.560 ms
      Memfd 2 MiB    : Read                     :   358.131 ms
      Memfd 2 MiB    : Write                    :   358.099 ms
      Memfd 2 MiB    : Read/Write               :   358.250 ms
      Memfd 2 MiB    : POPULATE_READ            :   357.563 ms
      Memfd 2 MiB    : POPULATE_WRITE           :   357.334 ms
      Memfd 2 MiB    : FALLOCATE                :   356.735 ms
      Memfd 2 MiB    : FALLOCATE+Read           :   358.152 ms
      Memfd 2 MiB    : FALLOCATE+Write          :   358.331 ms
      Memfd 2 MiB    : FALLOCATE+Read/Write     :   358.018 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_READ  :   357.286 ms
      Memfd 2 MiB    : FALLOCATE+POPULATE_WRITE :   357.523 ms
      tmpfs          : Read                     :  1087.265 ms
      tmpfs          : Write                    :   950.840 ms
      tmpfs          : Read/Write               :  1107.567 ms
      tmpfs          : POPULATE_READ            :   922.605 ms
      tmpfs          : POPULATE_WRITE           :   810.094 ms
      tmpfs          : FALLOCATE                :   306.320 ms
      tmpfs          : FALLOCATE+Read           :  1169.796 ms
      tmpfs          : FALLOCATE+Write          :   933.730 ms
      tmpfs          : FALLOCATE+Read/Write     :  1191.610 ms
      tmpfs          : FALLOCATE+POPULATE_READ  :  1020.474 ms
      tmpfs          : FALLOCATE+POPULATE_WRITE :   798.945 ms
      file           : Read                     :   654.101 ms
      file           : Write                    :  1259.142 ms
      file           : Read/Write               :  1289.509 ms
      file           : POPULATE_READ            :   661.642 ms
      file           : POPULATE_WRITE           :  1106.816 ms
      file           : FALLOCATE                :     1.864 ms
      file           : FALLOCATE+Read           :   656.328 ms
      file           : FALLOCATE+Write          :  1153.300 ms
      file           : FALLOCATE+Read/Write     :  1180.613 ms
      file           : FALLOCATE+POPULATE_READ  :   668.347 ms
      file           : FALLOCATE+POPULATE_WRITE :   996.143 ms
      hugetlbfs      : Read                     :   357.245 ms
      hugetlbfs      : Write                    :   357.413 ms
      hugetlbfs      : Read/Write               :   357.120 ms
      hugetlbfs      : POPULATE_READ            :   356.321 ms
      hugetlbfs      : POPULATE_WRITE           :   356.693 ms
      hugetlbfs      : FALLOCATE                :   355.927 ms
      hugetlbfs      : FALLOCATE+Read           :   357.074 ms
      hugetlbfs      : FALLOCATE+Write          :   357.120 ms
      hugetlbfs      : FALLOCATE+Read/Write     :   356.983 ms
      hugetlbfs      : FALLOCATE+POPULATE_READ  :   356.413 ms
      hugetlbfs      : FALLOCATE+POPULATE_WRITE :   356.266 ms
      **************************************************
      
      [1] https://lkml.org/lkml/2013/6/27/698
      
      [akpm@linux-foundation.org: coding style fixes]
      
      Link: https://lkml.kernel.org/r/20210419135443.12822-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ca9b385
  25. 30 6月, 2021 3 次提交
  26. 29 6月, 2021 1 次提交