1. 23 4月, 2022 11 次提交
  2. 16 4月, 2022 1 次提交
    • O
      mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore · c12cd77c
      Omar Sandoval 提交于
      Commit 3ee48b6a ("mm, x86: Saving vmcore with non-lazy freeing of
      vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
      lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
      purge the vmap areas instead of doing it lazily.
      
      Commit 690467c8 ("mm/vmalloc: Move draining areas out of caller
      context") moved the purging from the vunmap() caller to a worker thread.
      Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
      (possibly forever).  For example, consider the following scenario:
      
       1. Thread reads from /proc/vmcore. This eventually calls
          __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
          vmap_lazy_nr to lazy_max_pages() + 1.
      
       2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
          pages (one page plus the guard page) to the purge list and
          vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
          drain_vmap_work is scheduled.
      
       3. Thread returns from the kernel and is scheduled out.
      
       4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
          frees the 2 pages on the purge list. vmap_lazy_nr is now
          lazy_max_pages() + 1.
      
       5. This is still over the threshold, so it tries to purge areas again,
          but doesn't find anything.
      
       6. Repeat 5.
      
      If the system is running with only one CPU (which is typicial for kdump)
      and preemption is disabled, then this will never make forward progress:
      there aren't any more pages to purge, so it hangs.  If there is more
      than one CPU or preemption is enabled, then the worker thread will spin
      forever in the background.  (Note that if there were already pages to be
      purged at the time that set_iounmap_nonlazy() was called, this bug is
      avoided.)
      
      This can be reproduced with anything that reads from /proc/vmcore
      multiple times.  E.g., vmcore-dmesg /proc/vmcore.
      
      It turns out that improvements to vmap() over the years have obsoleted
      the need for this "optimization".  I benchmarked `dd if=/proc/vmcore
      of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore.
      The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and
      5.18-rc1 with set_iounmap_nonlazy() removed entirely:
      
          |5.17  |5.18+fix|5.18+removal
        4k|40.86s|  40.09s|      26.73s
        1M|24.47s|  23.98s|      21.84s
      
      The removal was the fastest (by a wide margin with 4k reads).  This
      patch removes set_iounmap_nonlazy().
      
      Link: https://lkml.kernel.org/r/52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com
      Fixes: 690467c8  ("mm/vmalloc: Move draining areas out of caller context")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Acked-by: NChris Down <chris@chrisdown.name>
      Reviewed-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c12cd77c
  3. 15 4月, 2022 4 次提交
  4. 13 4月, 2022 1 次提交
    • M
      stat: fix inconsistency between struct stat and struct compat_stat · 932aba1e
      Mikulas Patocka 提交于
      struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
      st_dev and st_rdev; struct compat_stat (defined in
      arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
      a 16-bit padding.
      
      This patch fixes struct compat_stat to match struct stat.
      
      [ Historical note: the old x86 'struct stat' did have that 16-bit field
        that the compat layer had kept around, but it was changes back in 2003
        by "struct stat - support larger dev_t":
      
          https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d
      
        and back in those days, the x86_64 port was still new, and separate
        from the i386 code, and had already picked up the old version with a
        16-bit st_dev field ]
      
      Note that we can't change compat_dev_t because it is used by
      compat_loop_info.
      
      Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
      old_valid_dev to test if the value fits into them.  This fixes
      -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
      number 259.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      932aba1e
  5. 12 4月, 2022 4 次提交
  6. 11 4月, 2022 3 次提交
  7. 09 4月, 2022 2 次提交
    • H
      RISC-V: KVM: include missing hwcap.h into vcpu_fp · 4054eee9
      Heiko Stuebner 提交于
      vcpu_fp uses the riscv_isa_extension mechanism which gets
      defined in hwcap.h but doesn't include that head file.
      
      While it seems to work in most cases, in certain conditions
      this can lead to build failures like
      
      ../arch/riscv/kvm/vcpu_fp.c: In function ‘kvm_riscv_vcpu_fp_reset’:
      ../arch/riscv/kvm/vcpu_fp.c:22:13: error: implicit declaration of function ‘riscv_isa_extension_available’ [-Werror=implicit-function-declaration]
         22 |         if (riscv_isa_extension_available(&isa, f) ||
            |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ../arch/riscv/kvm/vcpu_fp.c:22:49: error: ‘f’ undeclared (first use in this function)
         22 |         if (riscv_isa_extension_available(&isa, f) ||
      
      Fix this by simply including the necessary header.
      
      Fixes: 0a86512d ("RISC-V: KVM: Factor-out FP virtualization into separate
      sources")
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NAnup Patel <anup@brainfault.org>
      4054eee9
    • A
      RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put() · 8c3ce496
      Anup Patel 提交于
      We might have RISC-V systems (such as QEMU) where VMID is not part
      of the TLB entry tag so these systems will have to flush all TLB
      entries upon any change in hgatp.VMID.
      
      Currently, we zero-out hgatp CSR in kvm_arch_vcpu_put() and we
      re-program hgatp CSR in kvm_arch_vcpu_load(). For above described
      systems, this will flush all TLB entries whenever VCPU exits to
      user-space hence reducing performance.
      
      This patch fixes above described performance issue by not clearing
      hgatp CSR in kvm_arch_vcpu_put().
      
      Fixes: 34bde9d8 ("RISC-V: KVM: Implement VCPU world-switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnup Patel <apatel@ventanamicro.com>
      Signed-off-by: NAnup Patel <anup@brainfault.org>
      8c3ce496
  8. 08 4月, 2022 2 次提交
  9. 07 4月, 2022 12 次提交