1. 21 1月, 2021 2 次提交
  2. 20 1月, 2021 3 次提交
    • W
      arm64: mm: Implement arch_wants_old_prefaulted_pte() · 0388f9c7
      Will Deacon 提交于
      On CPUs with hardware AF/DBM, initialising prefaulted PTEs as 'old'
      improves vmscan behaviour and does not appear to introduce any overhead
      elsewhere.
      
      Implement arch_wants_old_prefaulted_pte() to return 'true' if we detect
      hardware access flag support at runtime. This can be extended in future
      based on MIDR matching if necessary.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      0388f9c7
    • W
      mm: Allow architectures to request 'old' entries when prefaulting · 46bdb427
      Will Deacon 提交于
      Commit 5c0a85fa ("mm: make faultaround produce old ptes") changed
      the "faultaround" behaviour to initialise prefaulted PTEs as 'old',
      since this avoids vmscan wrongly assuming that they are hot, despite
      having never been explicitly accessed by userspace. The change has been
      shown to benefit numerous arm64 micro-architectures (with hardware
      access flag) running Android, where both application launch latency and
      direct reclaim time are significantly reduced (by 10%+ and ~80%
      respectively).
      
      Unfortunately, commit 315d09bf ("Revert "mm: make faultaround
      produce old ptes"") reverted the change due to it being identified as
      the cause of a ~6% regression in unixbench on x86. Experiments on a
      variety of recent arm64 micro-architectures indicate that unixbench is
      not affected by the original commit, which appears to yield a 0-1%
      performance improvement.
      
      Since one size does not fit all for the initial state of prefaulted
      PTEs, introduce arch_wants_old_prefaulted_pte(), which allows an
      architecture to opt-in to 'old' prefaulted PTEs at runtime based on
      whatever criteria it may have.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NVinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      46bdb427
    • K
      mm: Cleanup faultaround and finish_fault() codepaths · f9ce0be7
      Kirill A. Shutemov 提交于
      alloc_set_pte() has two users with different requirements: in the
      faultaround code, it called from an atomic context and PTE page table
      has to be preallocated. finish_fault() can sleep and allocate page table
      as needed.
      
      PTL locking rules are also strange, hard to follow and overkill for
      finish_fault().
      
      Let's untangle the mess. alloc_set_pte() has gone now. All locking is
      explicit.
      
      The price is some code duplication to handle huge pages in faultaround
      path, but it should be fine, having overall improvement in readability.
      
      Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@boxSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      [will: s/from from/from/ in comment; spotted by willy]
      Signed-off-by: NWill Deacon <will@kernel.org>
      f9ce0be7
  3. 18 1月, 2021 5 次提交
    • L
      Linux 5.11-rc4 · 19c329f6
      Linus Torvalds 提交于
      19c329f6
    • L
      Merge tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · e2da7836
      Linus Torvalds 提交于
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix 'CPU too large' error in Intel PT
      
       - Correct event attribute sizes in 'perf inject'
      
       - Sync build_bug.h and kvm.h kernel copies
      
       - Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example
      
       - libbpf tests fixes
      
       - Fix shadow stat 'perf test' for non-bash shells
      
       - Take cgroups into account for shadow stats in 'perf stat'
      
      * tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf inject: Correct event attribute sizes
        perf intel-pt: Fix 'CPU too large' error
        perf stat: Take cgroups into account for shadow stats
        perf stat: Introduce struct runtime_stat_data
        libperf tests: Fail when failing to get a tracepoint id
        libperf tests: If a test fails return non-zero
        libperf tests: Avoid uninitialized variable warning
        perf test: Fix shadow stat test for non-bash shells
        tools headers: Syncronize linux/build_bug.h with the kernel sources
        tools headers UAPI: Sync kvm.h headers with the kernel sources
        perf bpf examples: Fix bpf.h header include directive in 5sec.c example
      e2da7836
    • L
      Merge tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a1339d63
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "One fix for a lack of alignment in our linker script, that can lead to
        crashes depending on configuration etc.
      
        One fix for the 32-bit VDSO after the C VDSO conversion.
      
        Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy"
      
      * tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/vdso: Fix clock_gettime_fallback for vdso32
        powerpc: Fix alignment bug within the init sections
      a1339d63
    • L
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a527a2b3
      Linus Torvalds 提交于
      Pull misc vfs fixes from Al Viro:
       "Several assorted fixes.
      
        I still think that audit ->d_name race is better fixed this way for
        the benefit of backports, with any possibly fancier variants done on
        top of it"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        dump_common_audit_data(): fix racy accesses to ->d_name
        iov_iter: fix the uaccess area in copy_compat_iovec_from_user
        umount(2): move the flag validity checks first
      a527a2b3
    • L
      mm: don't put pinned pages into the swap cache · feb889fb
      Linus Torvalds 提交于
      So technically there is nothing wrong with adding a pinned page to the
      swap cache, but the pinning obviously means that the page can't actually
      be free'd right now anyway, so it's a bit pointless.
      
      However, the real problem is not with it being a bit pointless: the real
      issue is that after we've added it to the swap cache, we'll try to unmap
      the page.  That will succeed, because the code in mm/rmap.c doesn't know
      or care about pinned pages.
      
      Even the unmapping isn't fatal per se, since the page will stay around
      in memory due to the pinning, and we do hold the connection to it using
      the swap cache.  But when we then touch it next and take a page fault,
      the logic in do_swap_page() will map it back into the process as a
      possibly read-only page, and we'll then break the page association on
      the next COW fault.
      
      Honestly, this issue could have been fixed in any of those other places:
      (a) we could refuse to unmap a pinned page (which makes conceptual
      sense), or (b) we could make sure to re-map a pinned page writably in
      do_swap_page(), or (c) we could just make do_wp_page() not COW the
      pinned page (which was what we historically did before that "mm:
      do_wp_page() simplification" commit).
      
      But while all of them are equally valid models for breaking this chain,
      not putting pinned pages into the swap cache in the first place is the
      simplest one by far.
      
      It's also the safest one: the reason why do_wp_page() was changed in the
      first place was that getting the "can I re-use this page" wrong is so
      fraught with errors.  If you do it wrong, you end up with an incorrectly
      shared page.
      
      As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
      do_swap_page() would be a serious bug since it is only a (very good)
      heuristic.  Re-using the page requires a hard black-and-white rule with
      no room for ambiguity.
      
      In contrast, saying "this page is very likely dma pinned, so let's not
      add it to the swap cache and try to unmap it" is an obviously safe thing
      to do, and if the heuristic might very rarely be a false positive, no
      harm is done.
      
      Fixes: 09854ba9 ("mm: do_wp_page() simplification")
      Reported-and-tested-by: NMartin Raiber <martin@urbackup.org>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      feb889fb
  4. 17 1月, 2021 7 次提交
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 0da0a8a0
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Nine minor fixes, seven in drivers and two in the core SCSI disk
        driver (sd) which should be harmless involving removing an unused
        variable and quietening a spurious warning"
      Signed-off-by: NJames E.J. Bottomley <jejb@linux.ibm.com>
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Remove obsolete variable in sd_remove()
        scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
        scsi: scsi_debug: Fix memleak in scsi_debug_init()
        scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"
        scsi: qedi: Correct max length of CHAP secret
        scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
        scsi: ufs: Relocate flush of exceptional event
        scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
        scsi: ufs: Fix possible power drain during system suspend
      0da0a8a0
    • A
      dump_common_audit_data(): fix racy accesses to ->d_name · d36a1dd9
      Al Viro 提交于
      We are not guaranteed the locking environment that would prevent
      dentry getting renamed right under us.  And it's possible for
      old long name to be freed after rename, leading to UAF here.
      
      Cc: stable@kernel.org # v2.6.2+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d36a1dd9
    • L
      Merge tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block · 54c6247d
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Just an nvme pull request via Christoph:
      
         - don't initialize hwmon for discover controllers (Sagi Grimberg)
      
         - fix iov_iter handling in nvme-tcp (Sagi Grimberg)
      
         - fix a preempt warning in nvme-tcp (Sagi Grimberg)
      
         - fix a possible NULL pointer dereference in nvme (Israel Rukshin)"
      
      * tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
        nvme: don't intialize hwmon for discovery controllers
        nvme-tcp: fix possible data corruption with bio merges
        nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
        nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
      54c6247d
    • L
      Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block · 11c0239a
      Linus Torvalds 提交于
      Pull io_uring fixes from Jens Axboe:
       "We still have a pending fix for a cancelation issue, but it's still
        being investigated. In the meantime:
      
         - Dead mm handling fix (Pavel)
      
         - SQPOLL setup error handling (Pavel)
      
         - Flush timeout sequence fix (Marcelo)
      
         - Missing finish_wait() for one exit case"
      
      * tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
        io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
        io_uring: flush timeouts that should already have expired
        io_uring: do sqo disable on install_fd error
        io_uring: fix null-deref in io_disable_sqo_submit
        io_uring: don't take files/mm for a dead task
        io_uring: drop mm and files after task_work_run
      11c0239a
    • L
      Merge tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · acda701b
      Linus Torvalds 提交于
      Pull RISC-V fixes from Palmer Dabbelt:
       "There are a few more fixes than a normal rc4, largely due to the
        bubble introduced by the holiday break:
      
         - return -ENOSYS for syscall number -1, which previously returned an
           uninitialized value.
      
         - ensure of_clk_init() has been called in time_init(), without which
           clock drivers may not be initialized.
      
         - fix sifive,uart0 driver to properly display the baud rate. A fix to
           initialize MPIE that allows interrupts to be processed during
           system calls.
      
         - avoid erronously begin tracing IRQs when interrupts are disabled,
           which at least triggers suprious lockdep failures.
      
         - workaround for a warning related to calling smp_processor_id()
           while preemptible. The warning itself is suprious on currently
           availiable systems.
      
         - properly include the generic time VDSO calls. A fix to our kasan
           address mapping. A fix to the HiFive Unleashed device tree, which
           allows the Ethernet PHY to be properly initialized by Linux (as
           opposed to relying on the bootloader).
      
         - defconfig update to include SiFive's GPIO driver, which is present
           on the HiFive Unleashed and necessary to initialize the PHY.
      
         - avoid allocating memory while initializing reserved memory.
      
         - avoid allocating the last 4K of memory, as pointers there alias
           with syscall errors.
      
        There are also two cleanups that should have no functional effect but
        do fix build warnings:
      
         - drop a duplicated definition of PAGE_KERNEL_EXEC.
      
         - properly declare the asm register SP shim.
      
         - cleanup the rv32 memory size Kconfig entry, to reflect the actual
           size of memory availiable"
      
      * tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Fix maximum allowed phsyical memory for RV32
        RISC-V: Set current memblock limit
        RISC-V: Do not allocate memblock while iterating reserved memblocks
        riscv: stacktrace: Move register keyword to beginning of declaration
        riscv: defconfig: enable gpio support for HiFive Unleashed
        dts: phy: add GPIO number and active state used for phy reset
        dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
        riscv: Fix KASAN memory mapping.
        riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
        riscv: cacheinfo: Fix using smp_processor_id() in preemptible
        riscv: Trace irq on only interrupt is enabled
        riscv: Drop a duplicated PAGE_KERNEL_EXEC
        riscv: Enable interrupts during syscalls with M-Mode
        riscv: Fix sifive serial driver
        riscv: Fix kernel time_init()
        riscv: return -ENOSYS for syscall -1
      acda701b
    • L
      mm: don't play games with pinned pages in clear_page_refs · 9348b73c
      Linus Torvalds 提交于
      Turning a pinned page read-only breaks the pinning after COW.  Don't do it.
      
      The whole "track page soft dirty" state doesn't work with pinned pages
      anyway, since the page might be dirtied by the pinning entity without
      ever being noticed in the page tables.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9348b73c
    • L
      mm: fix clear_refs_write locking · 29a951df
      Linus Torvalds 提交于
      Turning page table entries read-only requires the mmap_sem held for
      writing.
      
      So stop doing the odd games with turning things from read locks to write
      locks and back.  Just get the write lock.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29a951df
  5. 16 1月, 2021 23 次提交