1. 01 9月, 2020 2 次提交
  2. 27 8月, 2020 2 次提交
    • T
      x86/irq: Unbreak interrupt affinity setting · e027ffff
      Thomas Gleixner 提交于
      Several people reported that 5.8 broke the interrupt affinity setting
      mechanism.
      
      The consolidation of the entry code reused the regular exception entry code
      for device interrupts and changed the way how the vector number is conveyed
      from ptregs->orig_ax to a function argument.
      
      The low level entry uses the hardware error code slot to push the vector
      number onto the stack which is retrieved from there into a function
      argument and the slot on stack is set to -1.
      
      The reason for setting it to -1 is that the error code slot is at the
      position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
      indicates that the entry came via a syscall. If it's not set to a negative
      value then a signal delivery on return to userspace would try to restart a
      syscall. But there are other places which rely on pt_regs::orig_ax being a
      valid indicator for syscall entry.
      
      But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
      interrupt affinity setting mechanism, which was overlooked when this change
      was made.
      
      Moving interrupts on x86 happens in several steps. A new vector on a
      different CPU is allocated and the relevant interrupt source is
      reprogrammed to that. But that's racy and there might be an interrupt
      already in flight to the old vector. So the old vector is preserved until
      the first interrupt arrives on the new vector and the new target CPU. Once
      that happens the old vector is cleaned up, but this cleanup still depends
      on the vector number being stored in pt_regs::orig_ax, which is now -1.
      
      That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
      always false. As a consequence the interrupt is moved once, but then it
      cannot be moved anymore because the cleanup of the old vector never
      happens.
      
      There would be several ways to convey the vector information to that place
      in the guts of the interrupt handling, but on deeper inspection it turned
      out that this check is pointless and a leftover from the old affinity model
      of X86 which supported multi-CPU affinities. Under this model it was
      possible that an interrupt had an old and a new vector on the same CPU, so
      the vector match was required.
      
      Under the new model the effective affinity of an interrupt is always a
      single CPU from the requested affinity mask. If the affinity mask changes
      then either the interrupt stays on the CPU and on the same vector when that
      CPU is still in the new affinity mask or it is moved to a different CPU, but
      it is never moved to a different vector on the same CPU.
      
      Ergo the cleanup check for the matching vector number is not required and
      can be removed which makes the dependency on pt_regs:orig_ax go away.
      
      The remaining check for new_cpu == smp_processsor_id() is completely
      sufficient. If it matches then the interrupt was successfully migrated and
      the cleanup can proceed.
      
      For paranoia sake add a warning into the vector assignment code to
      validate that the assumption of never moving to a different vector on
      the same CPU holds.
      
      Fixes: 633260fa ("x86/irq: Convey vector as argument and not in ptregs")
      Reported-by: NAlex bykov <alex.bykov@scylladb.com>
      Reported-by: NAvi Kivity <avi@scylladb.com>
      Reported-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexander Graf <graf@amazon.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
      e027ffff
    • A
      x86/hotplug: Silence APIC only after all interrupts are migrated · 52d6b926
      Ashok Raj 提交于
      There is a race when taking a CPU offline. Current code looks like this:
      
      native_cpu_disable()
      {
      	...
      	apic_soft_disable();
      	/*
      	 * Any existing set bits for pending interrupt to
      	 * this CPU are preserved and will be sent via IPI
      	 * to another CPU by fixup_irqs().
      	 */
      	cpu_disable_common();
      	{
      		....
      		/*
      		 * Race window happens here. Once local APIC has been
      		 * disabled any new interrupts from the device to
      		 * the old CPU are lost
      		 */
      		fixup_irqs(); // Too late to capture anything in IRR.
      		...
      	}
      }
      
      The fix is to disable the APIC *after* cpu_disable_common().
      
      Testing was done with a USB NIC that provided a source of frequent
      interrupts. A script migrated interrupts to a specific CPU and
      then took that CPU offline.
      
      Fixes: 60dcaad5 ("x86/hotplug: Silence APIC and NMI when CPU is dead")
      Reported-by: NEvan Green <evgreen@chromium.org>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Tested-by: NEvan Green <evgreen@chromium.org>
      Reviewed-by: NEvan Green <evgreen@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/
      Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
      52d6b926
  3. 26 8月, 2020 3 次提交
  4. 24 8月, 2020 1 次提交
  5. 22 8月, 2020 1 次提交
    • W
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon 提交于
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
  6. 21 8月, 2020 1 次提交
    • S
      x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM · 6a3ea3e6
      Sean Christopherson 提交于
      KVM has an optmization to avoid expensive MRS read/writes on
      VMENTER/EXIT. It caches the MSR values and restores them either when
      leaving the run loop, on preemption or when going out to user space.
      
      The affected MSRs are not required for kernel context operations. This
      changed with the recently introduced mechanism to handle FSGSBASE in the
      paranoid entry code which has to retrieve the kernel GSBASE value by
      accessing per CPU memory. The mechanism needs to retrieve the CPU number
      and uses either LSL or RDPID if the processor supports it.
      
      Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
      lazily restored MSRs, which means between the point where the guest value
      is written and the point of restore, MSR_TSC_AUX contains a random number.
      
      If an NMI or any other exception which uses the paranoid entry path happens
      in such a context, then RDPID returns the random guest MSR_TSC_AUX value.
      
      As a consequence this reads from the wrong memory location to retrieve the
      kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
      operations. If the GSBASE in the exception handler points to the per CPU
      memory of a different CPU then this has the obvious consequences of data
      corruption and crashes.
      
      As the paranoid entry path is the only place which accesses MSR_TSX_AUX
      (via RDPID) and the fallback via LSL is not significantly slower, remove
      the RDPID alternative from the entry path and always use LSL.
      
      The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
      which would be inflicting massive overhead on that code path.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: eaad9812 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      Reported-by: NTom Lendacky <thomas.lendacky@amd.com>
      Debugged-by: NTom Lendacky <thomas.lendacky@amd.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
      6a3ea3e6
  7. 20 8月, 2020 4 次提交
  8. 18 8月, 2020 3 次提交
    • J
      kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode · cb957adb
      Jim Mattson 提交于
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: b9baba86 ("KVM, pkeys: expose CPUID/CR4 to guest")
      Cc: Huaitong Han <huaitong.han@intel.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb957adb
    • J
      kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode · 427890af
      Jim Mattson 提交于
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: 0be0226f ("KVM: MMU: fix SMAP virtualization")
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-2-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      427890af
    • P
      KVM: x86: fix access code passed to gva_to_gpa · 19cf4b7e
      Paolo Bonzini 提交于
      The PK bit of the error code is computed dynamically in permission_fault
      and therefore need not be passed to gva_to_gpa: only the access bits
      (fetch, user, write) need to be passed down.
      
      Not doing so causes a splat in the pku test:
      
         WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
         Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
         RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
         Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
         RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
         RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
         RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
         RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
         R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
         R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
         FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         PKRU: 55555554
         Call Trace:
          paging64_gva_to_gpa+0x3f/0xb0 [kvm]
          kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
          handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
          kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
          ksys_ioctl+0x92/0xb0
          __x64_sys_ioctl+0x16/0x20
          do_syscall_64+0x3e/0xb0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
         ---[ end trace d17eb998aee991da ]---
      Reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Fixes: 89786147 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19cf4b7e
  9. 16 8月, 2020 1 次提交
  10. 15 8月, 2020 2 次提交
    • X
      all arch: remove system call sys_sysctl · 88db0aa2
      Xiaoming Ni 提交于
      Since commit 61a47c1a ("sysctl: Remove the sysctl system call"),
      sys_sysctl is actually unavailable: any input can only return an error.
      
      We have been warning about people using the sysctl system call for years
      and believe there are no more users.  Even if there are users of this
      interface if they have not complained or fixed their code by now they
      probably are not going to, so there is no point in warning them any
      longer.
      
      So completely remove sys_sysctl on all architectures.
      
      [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
       Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.comSigned-off-by: NXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: chenzefeng <chenzefeng2@huawei.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kars de Jong <jongk@linux-m68k.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Sven Schnelle <svens@stackframe.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
      Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88db0aa2
    • E
      x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task · 8ab49526
      Eric Dumazet 提交于
      syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:
      
         KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
         CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
         RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
         Call Trace:
           putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
           genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
           genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
           copy_regset_from_user include/linux/regset.h:326 [inline]
           ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
           compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
           __do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
           __se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
           __ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
           do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
           __do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
           do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
           entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
      
      This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
      or GS for a task with no LDT and something tries to read the base before
      a return to usermode notices the bad selector and fixes it.
      
      The fix is to make sure ldt pointer is not NULL.
      
      Fixes: 07e1d88a ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
      Co-developed-by: NJann Horn <jannh@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab49526
  11. 14 8月, 2020 3 次提交
  12. 13 8月, 2020 6 次提交
    • S
      x86/alternatives: Acquire pte lock with interrupts enabled · a6d996cb
      Sebastian Andrzej Siewior 提交于
      pte lock is never acquired in-IRQ context so it does not require interrupts
      to be disabled. The lock is a regular spinlock which cannot be acquired
      with interrupts disabled on RT.
      
      RT complains about pte_lock() in __text_poke() because it's invoked after
      disabling interrupts.
      
      __text_poke() has to disable interrupts as use_temporary_mm() expects
      interrupts to be off because it invokes switch_mm_irqs_off() and uses
      per-CPU (current active mm) data.
      
      Move the PTE lock handling outside the interrupt disabled region.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by; Peter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20200813105026.bvugytmsso6muljw@linutronix.de
      a6d996cb
    • P
      mm/x86: use general page fault accounting · 968614fc
      Peter Xu 提交于
      Use the general page fault accounting by passing regs into
      handle_mm_fault().
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20200707225021.200906-23-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      968614fc
    • P
      mm: do page fault accounting in handle_mm_fault · bce617ed
      Peter Xu 提交于
      Patch series "mm: Page fault accounting cleanups", v5.
      
      This is v5 of the pf accounting cleanup series.  It originates from Gerald
      Schaefer's report on an issue a week ago regarding to incorrect page fault
      accountings for retried page fault after commit 4064b982 ("mm: allow
      VM_FAULT_RETRY for multiple times"):
      
        https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
      
      What this series did:
      
        - Correct page fault accounting: we do accounting for a page fault
          (no matter whether it's from #PF handling, or gup, or anything else)
          only with the one that completed the fault.  For example, page fault
          retries should not be counted in page fault counters.  Same to the
          perf events.
      
        - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
          event is used in an adhoc way across different archs.
      
          Case (1): for many archs it's done at the entry of a page fault
          handler, so that it will also cover e.g.  errornous faults.
      
          Case (2): for some other archs, it is only accounted when the page
          fault is resolved successfully.
      
          Case (3): there're still quite some archs that have not enabled
          this perf event.
      
          Since this series will touch merely all the archs, we unify this
          perf event to always follow case (1), which is the one that makes most
          sense.  And since we moved the accounting into handle_mm_fault, the
          other two MAJ/MIN perf events are well taken care of naturally.
      
        - Unify definition of "major faults": the definition of "major
          fault" is slightly changed when used in accounting (not
          VM_FAULT_MAJOR).  More information in patch 1.
      
        - Always account the page fault onto the one that triggered the page
          fault.  This does not matter much for #PF handlings, but mostly for
          gup.  More information on this in patch 25.
      
      Patchset layout:
      
      Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
      Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
      Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
      Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more
      
      This patch (of 25):
      
      This is a preparation patch to move page fault accountings into the
      general code in handle_mm_fault().  This includes both the per task
      flt_maj/flt_min counters, and the major/minor page fault perf events.  To
      do this, the pt_regs pointer is passed into handle_mm_fault().
      
      PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
      handlers.
      
      So far, all the pt_regs pointer that passed into handle_mm_fault() is
      NULL, which means this patch should have no intented functional change.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
      Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bce617ed
    • C
      uaccess: remove segment_eq · 428e2976
      Christoph Hellwig 提交于
      segment_eq is only used to implement uaccess_kernel.  Just open code
      uaccess_kernel in the arch uaccess headers and remove one layer of
      indirection.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NGreentime Hu <green.hu@gmail.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      428e2976
    • J
      mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid() · d622ecec
      Jia He 提交于
      This is to introduce a general dummy helper.  memory_add_physaddr_to_nid()
      is a fallback option to get the nid in case NUMA_NO_NID is detected.
      
      After this patch, arm64/sh/s390 can simply use the general dummy version.
      PowerPC/x86/ia64 will still use their specific version.
      
      This is the preparation to set a fallback value for dev_dax->target_node.
      Signed-off-by: NJia He <justin.he@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chuhong Yuan <hslester96@gmail.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
      Cc: Kaly Xin <Kaly.Xin@arm.com>
      Link: http://lkml.kernel.org/r/20200710031619.18762-2-justin.he@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d622ecec
    • D
      x86/mm: use max memory block size on bare metal · fe124c95
      Daniel Jordan 提交于
      Some of our servers spend significant time at kernel boot initializing
      memory block sysfs directories and then creating symlinks between them and
      the corresponding nodes.  The slowness happens because the machines get
      stuck with the smallest supported memory block size on x86 (128M), which
      results in 16,288 directories to cover the 2T of installed RAM.  The
      search for each memory block is noticeable even with commit 4fb6eabf
      ("drivers/base/memory.c: cache memory blocks in xarray to accelerate
      lookup").
      
      Commit 078eb6aa ("x86/mm/memory_hotplug: determine block size based on
      the end of boot memory") chooses the block size based on alignment with
      memory end.  That addresses hotplug failures in qemu guests, but for bare
      metal systems whose memory end isn't aligned to even the smallest size, it
      leaves them at 128M.
      
      Make kernels that aren't running on a hypervisor use the largest supported
      size (2G) to minimize overhead on big machines.  Kernel boot goes 7%
      faster on the aforementioned servers, shaving off half a second.
      
      [daniel.m.jordan@oracle.com: v3]
        Link: http://lkml.kernel.org/r/20200714205450.945834-1-daniel.m.jordan@oracle.comSigned-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20200609225451.3542648-1-daniel.m.jordan@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe124c95
  13. 11 8月, 2020 5 次提交
  14. 10 8月, 2020 2 次提交
    • P
      x86: Expose SERIALIZE for supported cpuid · 43bd9ef4
      Paolo Bonzini 提交于
      The SERIALIZE instruction is supported by Tntel processors, like
      Sapphire Rapids.  SERIALIZE is a faster serializing instruction which
      does not modify registers, arithmetic flags or memory, will not cause VM
      exit. It's availability is indicated by CPUID.(EAX=7,ECX=0):ECX[bit 14].
      
      Expose it in KVM supported CPUID.  This way, KVM could pass this
      information to guests and they can make use of these features accordingly.
      Signed-off-by: NCathy Zhang <cathy.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      43bd9ef4
    • S
      KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled · 05487215
      Sean Christopherson 提交于
      Don't attempt to load PDPTRs if EFER.LME=1, i.e. if 64-bit mode is
      enabled.  A recent change to reload the PDTPRs when CR0.CD or CR0.NW is
      toggled botched the EFER.LME handling and sends KVM down the PDTPR path
      when is_paging() is true, i.e. when the guest toggles CD/NW in 64-bit
      mode.
      
      Split the CR0 checks for 64-bit vs. 32-bit PAE into separate paths.  The
      64-bit path is specifically checking state when paging is toggled on,
      i.e. CR0.PG transititions from 0->1.  The PDPTR path now needs to run if
      the new CR0 state has paging enabled, irrespective of whether paging was
      already enabled.  Trying to shave a few cycles to make the PDPTR path an
      "else if" case is a mess.
      
      Fixes: d42e3fae ("kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Peter Shier <pshier@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20200714015732.32426-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05487215
  15. 08 8月, 2020 4 次提交
    • M
      mm/sparse: cleanup the code surrounding memory_present() · c89ab04f
      Mike Rapoport 提交于
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
      functions that call memory_present() for each region in memblock.memory:
      sparse_memory_present_with_active_regions() and membocks_present().
      
      Moreover, all architectures have a call to either of these functions
      preceding the call to sparse_init() and in the most cases they are called
      one after the other.
      
      Mark the regions from memblock.memory as present during sparce_init() by
      making sparse_init() call memblocks_present(), make memblocks_present()
      and memory_present() functions static and remove redundant
      sparse_memory_present_with_active_regions() function.
      
      Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c89ab04f
    • A
      mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() · 56993b4e
      Anshuman Khandual 提交于
      There are many instances where vmemap allocation is often switched between
      regular memory and device memory just based on whether altmap is available
      or not.  vmemmap_alloc_block_buf() is used in various platforms to
      allocate vmemmap mappings.  Lets also enable it to handle altmap based
      device memory allocation along with existing regular memory allocations.
      This will help in avoiding the altmap based allocation switch in many
      places.  To summarize there are two different methods to call
      vmemmap_alloc_block_buf().
      
      vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
      vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */
      
      This converts altmap_alloc_block_buf() into a static function, drops it's
      entry from the header and updates Documentation/vm/memory-model.rst.
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NJia He <justin.he@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Will Deacon <will@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56993b4e
    • A
      mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages() · 1d9cfee7
      Anshuman Khandual 提交于
      Patch series "arm64: Enable vmemmap mapping from device memory", v4.
      
      This series enables vmemmap backing memory allocation from device memory
      ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
      and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
      alocation requests.
      
      This patch (of 3):
      
      vmemmap_populate_basepages() is used across platforms to allocate backing
      memory for vmemmap mapping.  This is used as a standard default choice or
      as a fallback when intended huge pages allocation fails.  This just
      creates entire vmemmap mapping with base pages (PAGE_SIZE).
      
      On arm64 platforms, vmemmap_populate_basepages() is called instead of the
      platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
      is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.
      
      At present vmemmap_populate_basepages() does not support allocating from
      driver defined struct vmem_altmap while trying to create vmemmap mapping
      for a device memory range.  It prevents ARM64_16K_PAGES and
      ARM64_64K_PAGES configs on arm64 from supporting device memory with
      vmemap_altmap request.
      
      This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
      device memory allocation for vmemap mapping on arm64 platforms with 16K or
      64K base page configs.
      
      Each architecture should evaluate and decide on subscribing device memory
      based base page allocation through vmemmap_populate_basepages().  Hence
      lets keep it disabled on all archs in order to preserve the existing
      semantics.  A subsequent patch enables it on arm64.
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NJia He <justin.he@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
      Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d9cfee7
    • M
      asm-generic: pgalloc: provide generic pgd_free() · f9cb654c
      Mike Rapoport 提交于
      Most architectures define pgd_free() as a wrapper for free_page().
      
      Provide a generic version in asm-generic/pgalloc.h and enable its use for
      most architectures.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9cb654c