1. 16 4月, 2019 1 次提交
    • K
      perf/x86: Support outputting XMM registers · 878068ea
      Kan Liang 提交于
      Starting from Icelake, XMM registers can be collected in PEBS record.
      But current code only output the pt_regs.
      
      Add a new struct x86_perf_regs for both pt_regs and xmm_regs. The
      xmm_regs will be used later to keep a pointer to PEBS record which has
      XMM information.
      
      XMM registers are 128 bit. To simplify the code, they are handled like
      two different registers, which means setting two bits in the register
      bitmap. This also allows only sampling the lower 64bit bits in XMM.
      
      The index of XMM registers starts from 32. There are 16 XMM registers.
      So all reserved space for regs are used. Remove REG_RESERVED.
      
      Add PERF_REG_X86_XMM_MAX, which stands for the max number of all x86
      regs including both GPRs and XMM.
      
      Add REG_NOSUPPORT for 32bit to exclude unsupported registers.
      
      Previous platforms can not collect XMM information in PEBS record.
      Adding pebs_no_xmm_regs to indicate the unsupported platforms.
      
      The common code still validates the supported registers. However, it
      cannot check model specific registers, e.g. XMM. Add extra check in
      x86_pmu_hw_config() to reject invalid config of regs_user and regs_intr.
      The regs_user never supports XMM collection.
      The regs_intr only supports XMM collection when sampling PEBS event on
      icelake and later platforms.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-3-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      878068ea
  2. 06 4月, 2019 1 次提交
    • A
      x86/asm: Use stricter assembly constraints in bitops · 5b77e95d
      Alexander Potapenko 提交于
      There's a number of problems with how arch/x86/include/asm/bitops.h
      is currently using assembly constraints for the memory region
      bitops are modifying:
      
      1) Use memory clobber in bitops that touch arbitrary memory
      
      Certain bit operations that read/write bits take a base pointer and an
      arbitrarily large offset to address the bit relative to that base.
      Inline assembly constraints aren't expressive enough to tell the
      compiler that the assembly directive is going to touch a specific memory
      location of unknown size, therefore we have to use the "memory" clobber
      to indicate that the assembly is going to access memory locations other
      than those listed in the inputs/outputs.
      
      To indicate that BTR/BTS instructions don't necessarily touch the first
      sizeof(long) bytes of the argument, we also move the address to assembly
      inputs.
      
      This particular change leads to size increase of 124 kernel functions in
      a defconfig build. For some of them the diff is in NOP operations, other
      end up re-reading values from memory and may potentially slow down the
      execution. But without these clobbers the compiler is free to cache
      the contents of the bitmaps and use them as if they weren't changed by
      the inline assembly.
      
      2) Use byte-sized arguments for operations touching single bytes.
      
      Passing a long value to ANDB/ORB/XORB instructions makes the compiler
      treat sizeof(long) bytes as being clobbered, which isn't the case. This
      may theoretically lead to worse code in the case of heavy optimization.
      
      Practical impact:
      
      I've built a defconfig kernel and looked through some of the functions
      generated by GCC 7.3.0 with and without this clobber, and didn't spot
      any miscompilations.
      
      However there is a (trivial) theoretical case where this code leads to
      miscompilation:
      
        https://lkml.org/lkml/2019/3/28/393
      
      using just GCC 8.3.0 with -O2.  It isn't hard to imagine someone writes
      such a function in the kernel someday.
      
      So the primary motivation is to fix an existing misuse of the asm
      directive, which happens to work in certain configurations now, but
      isn't guaranteed to work under different circumstances.
      
      [ --mingo: Added -stable tag because defconfig only builds a fraction
        of the kernel and the trivial testcase looks normal enough to
        be used in existing or in-development code. ]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Y Knight <jyknight@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
      [ Edited the changelog, tidied up one of the defines. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5b77e95d
  3. 05 4月, 2019 3 次提交
    • S
      syscalls: Remove start and number from syscall_set_arguments() args · 32d92586
      Steven Rostedt (VMware) 提交于
      After removing the start and count arguments of syscall_get_arguments() it
      seems reasonable to remove them from syscall_set_arguments(). Note, as of
      today, there are no users of syscall_set_arguments(). But we are told that
      there will be soon. But for now, at least make it consistent with
      syscall_get_arguments().
      
      Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: x86@kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
      Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
      Reviewed-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      32d92586
    • S
      syscalls: Remove start and number from syscall_get_arguments() args · b35f549d
      Steven Rostedt (Red Hat) 提交于
      At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
      function call syscall_get_arguments() implemented in x86 was horribly
      written and not optimized for the standard case of passing in 0 and 6 for
      the starting index and the number of system calls to get. When looking at
      all the users of this function, I discovered that all instances pass in only
      0 and 6 for these arguments. Instead of having this function handle
      different cases that are never used, simply rewrite it to return the first 6
      arguments of a system call.
      
      This should help out the performance of tracing system calls by ptrace,
      ftrace and perf.
      
      Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: x86@kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
      Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
      Reviewed-by: NDmitry V. Levin <ldv@altlinux.org>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b35f549d
    • D
      xen: Prevent buffer overflow in privcmd ioctl · 42d8644b
      Dan Carpenter 提交于
      The "call" variable comes from the user in privcmd_ioctl_hypercall().
      It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
      elements.  We need to put an upper bound on it to prevent an out of
      bounds access.
      
      Cc: stable@vger.kernel.org
      Fixes: 1246ae0b ("xen: add variable hypercall caller")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      42d8644b
  4. 29 3月, 2019 7 次提交
    • M
      x86/realmode: Make set_real_mode_mem() static inline · f560bd19
      Matteo Croce 提交于
      Remove the unused @size argument and move it into a header file, so it
      can be inlined.
      
       [ bp: Massage. ]
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-efi <linux-efi@vger.kernel.org>
      Cc: platform-driver-x86@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190328114233.27835-1-mcroce@redhat.com
      f560bd19
    • S
      KVM: x86: update %rip after emulating IO · 45def77e
      Sean Christopherson 提交于
      Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
      OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
      vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
      that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
      
      To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
      Skip pio instruction when it is emulated, not executed") changed the
      behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
      instruction prior to exiting to userspace.  Full emulation doesn't need
      such tricks becase re-emulating the instruction will naturally handle
      %rip being changed to point at the reset vector.
      
      Updating %rip prior to executing to userspace has several drawbacks:
      
        - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
          fails it will likely yell about the wrong address.
        - Single step exits to userspace for are effectively dropped as
          KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
        - Behavior of PIO emulation is different depending on whether it
          goes down the fast path or the slow path.
      
      Rather than skip the PIO instruction before exiting to userspace,
      snapshot the linear %rip and cancel PIO completion if the current
      value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
      common scenario, the snapshot and comparison has negligible overhead
      as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
      VMREAD in this case.
      
      All other alternatives to snapshotting the linear %rip that don't
      rely on an explicit reset announcenment suffer from one corner case
      or another.  For example, canceling PIO completion on any write to
      %rip fails if userspace does a save/restore of %rip, and attempting to
      avoid that issue by canceling PIO only if %rip changed then fails if PIO
      collides with the reset %rip.  Attempting to zero in on the exact reset
      vector won't work for APs, which means adding more hooks such as the
      vCPU's MP_STATE, and so on and so forth.
      
      Checking for a linear %rip match technically suffers from corner cases,
      e.g. userspace could theoretically rewrite the underlying code page and
      expect a different instruction to execute, or the guest hardcodes a PIO
      reset at 0xfffffff0, but those are far, far outside of what can be
      considered normal operation.
      
      Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45def77e
    • S
      KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 0cf9135b
      Sean Christopherson 提交于
      The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
      userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
      regardless of hardware support under the pretense that KVM fully
      emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
      handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
      also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
      
      Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
      that it's emulated on AMD hosts.
      
      Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
      Cc: stable@vger.kernel.org
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0cf9135b
    • W
      KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() · 4d66623c
      Wei Yang 提交于
      * nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is
        non-zero.
      
      * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages()
        never return zero.
      
      Based on these two reasons, we can merge the two *if* clause and use the
      return value from kvm_mmu_calculate_mmu_pages() directly. This simplify
      the code and also eliminate the possibility for reader to believe
      nr_mmu_pages would be zero.
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d66623c
    • S
      KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) · 05d5a486
      Singh, Brijesh 提交于
      Errata#1096:
      
      On a nested data page fault when CR.SMAP=1 and the guest data read
      generates a SMAP violation, GuestInstrBytes field of the VMCB on a
      VMEXIT will incorrectly return 0h instead the correct guest
      instruction bytes .
      
      Recommend Workaround:
      
      To determine what instruction the guest was executing the hypervisor
      will have to decode the instruction at the instruction pointer.
      
      The recommended workaround can not be implemented for the SEV
      guest because guest memory is encrypted with the guest specific key,
      and instruction decoder will not be able to decode the instruction
      bytes. If we hit this errata in the SEV guest then log the message
      and request a guest shutdown.
      Reported-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05d5a486
    • S
      KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' · 47c42e6b
      Sean Christopherson 提交于
      The cr4_pae flag is a bit of a misnomer, its purpose is really to track
      whether the guest PTE that is being shadowed is a 4-byte entry or an
      8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
      reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
      but it was mostly harmless since the size of the gpte never mattered.
      Now that a spte may be tracking an indirect EPT entry, relying on
      CR4.PAE is wrong and ill-named.
      
      For direct shadow pages, force the gpte_size to '1' as they are always
      8-byte entries; EPT entries can only be 8-bytes and KVM always uses
      8-byte entries for NPT and its identity map (when running with EPT but
      not unrestricted guest).
      
      Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
      unique scenario as the size of the entries are not dictated by CR4.PAE,
      but neither is the shadow page a direct map.  To handle this scenario,
      set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
      to denote a nested EPT shadow page.  Use the information to avoid
      incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
      
      Providing a consistent and accurate gpte_size fixes a bug reported by
      Vitaly where fast_cr3_switch() always fails when switching from L2 to
      L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
      whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c42e6b
    • J
      x86/cpufeature: Fix __percpu annotation in this_cpu_has() · f6027c81
      Jann Horn 提交于
      &cpu_info.x86_capability is __percpu, and the second argument of
      x86_this_cpu_test_bit() is expected to be __percpu. Don't cast the
      __percpu away and then implicitly add it again. This gets rid of 106
      lines of sparse warnings with the kernel config I'm using.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190328154948.152273-1-jannh@google.com
      f6027c81
  5. 21 3月, 2019 1 次提交
  6. 07 3月, 2019 2 次提交
    • Q
      Revert "x86_64: Increase stack size for KASAN_EXTRA" · a2863b53
      Qian Cai 提交于
      This reverts commit a8e911d1.
      KASAN_EXTRA was removed via the commit 7771bdbb ("kasan: remove use
      after scope bugs detection."), so this is no longer needed.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: bp@alien8.de
      Cc: akpm@linux-foundation.org
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Cc: dvyukov@google.com
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20190306213806.46139-1-cai@lca.pw
      a2863b53
    • J
      x86/unwind: Handle NULL pointer calls better in frame unwinder · f4f34e1b
      Jann Horn 提交于
      When the frame unwinder is invoked for an oops caused by a call to NULL, it
      currently skips the parent function because BP still points to the parent's
      stack frame; the (nonexistent) current function only has the first half of
      a stack frame, and BP doesn't point to it yet.
      
      Add a special case for IP==0 that calculates a fake BP from SP, then uses
      the real BP for the next frame.
      
      Note that this handles first_frame specially: Return information about the
      parent function as long as the saved IP is >=first_frame, even if the fake
      BP points below it.
      
      With an artificially-added NULL call in prctl_set_seccomp(), before this
      patch, the trace is:
      
      Call Trace:
       ? prctl_set_seccomp+0x3a/0x50
       __x64_sys_prctl+0x457/0x6f0
       ? __ia32_sys_prctl+0x750/0x750
       do_syscall_64+0x72/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      After this patch, the trace is:
      
      Call Trace:
       prctl_set_seccomp+0x3a/0x50
       __x64_sys_prctl+0x457/0x6f0
       ? __ia32_sys_prctl+0x750/0x750
       do_syscall_64+0x72/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: linux-kbuild@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.com
      f4f34e1b
  7. 06 3月, 2019 6 次提交
    • P
      x86: Add TSX Force Abort CPUID/MSR · 52f64909
      Peter Zijlstra (Intel) 提交于
      Skylake systems will receive a microcode update to address a TSX
      errata. This microcode will (by default) clobber PMC3 when TSX
      instructions are (speculatively or not) executed.
      
      It also provides an MSR to cause all TSX transaction to abort and
      preserve PMC3.
      
      Add the CPUID enumeration and MSR definition.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      52f64909
    • M
      docs/core-api/mm: fix user memory accessors formatting · bc8ff3ca
      Mike Rapoport 提交于
      The descriptions of userspace memory access functions had minor issues
      with formatting that made kernel-doc unable to properly detect the
      function/macro names and the return value sections:
      
      ./arch/x86/include/asm/uaccess.h:80: info: Scanning doc for
      ./arch/x86/include/asm/uaccess.h:139: info: Scanning doc for
      ./arch/x86/include/asm/uaccess.h:231: info: Scanning doc for
      ./arch/x86/include/asm/uaccess.h:505: info: Scanning doc for
      ./arch/x86/include/asm/uaccess.h:530: info: Scanning doc for
      ./arch/x86/lib/usercopy_32.c:58: info: Scanning doc for
      ./arch/x86/lib/usercopy_32.c:69: warning: No description found for return
      value of 'clear_user'
      ./arch/x86/lib/usercopy_32.c:78: info: Scanning doc for
      ./arch/x86/lib/usercopy_32.c:90: warning: No description found for return
      value of '__clear_user'
      
      Fix the formatting.
      
      Link: http://lkml.kernel.org/r/1549549644-4903-3-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc8ff3ca
    • A
      mm: update ptep_modify_prot_commit to take old pte value as arg · 04a86453
      Aneesh Kumar K.V 提交于
      Architectures like ppc64 require to do a conditional tlb flush based on
      the old and new value of pte.  Enable that by passing old pte value as
      the arg.
      
      Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04a86453
    • A
      mm: update ptep_modify_prot_start/commit to take vm_area_struct as arg · 0cbe3e26
      Aneesh Kumar K.V 提交于
      Patch series "NestMMU pte upgrade workaround for mprotect", v5.
      
      We can upgrade pte access (R -> RW transition) via mprotect.  We need to
      make sure we follow the recommended pte update sequence as outlined in
      commit bd5050e3 ("powerpc/mm/radix: Change pte relax sequence to
      handle nest MMU hang") for such updates.  This patch series does that.
      
      This patch (of 5):
      
      Some architectures may want to call flush_tlb_range from these helpers.
      
      Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cbe3e26
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
    • L
      a.out: remove core dumping support · 08300f44
      Linus Torvalds 提交于
      We're (finally) phasing out a.out support for good.  As Borislav Petkov
      points out, we've supported ELF binaries for about 25 years by now, and
      coredumping in particular has bitrotted over the years.
      
      None of the tool chains even support generating a.out binaries any more,
      and the plan is to deprecate a.out support entirely for the kernel.  But
      I want to start with just removing the core dumping code, because I can
      still imagine that somebody actually might want to support a.out as a
      simpler biinary format.
      
      Particularly if you generate some random binaries on the fly, ELF is a
      much more complicated format (admittedly ELF also does have a lot of
      toolchain support, mitigating that complexity a lot and you really
      should have moved over in the last 25 years).
      
      So it's at least somewhat possible that somebody out there has some
      workflow that still involves generating and running a.out executables.
      
      In contrast, it's very unlikely that anybody depends on debugging any
      legacy a.out core files.  But regardless, I want this phase-out to be
      done in two steps, so that we can resurrect a.out support (if needed)
      without having to resurrect the core file dumping that is almost
      certainly not needed.
      
      Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial
      cut at this had missed.
      
      And Alan Cox points out that the a.out binary loader _could_ be done in
      user space if somebody wants to, but we might keep just the loader in
      the kernel if somebody really wants it, since the loader isn't that big
      and has no really odd special cases like the core dumping does.
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jann Horn <jannh@google.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08300f44
  8. 05 3月, 2019 2 次提交
    • A
      xen: remove pre-xen3 fallback handlers · b1ddd406
      Arnd Bergmann 提交于
      The legacy hypercall handlers were originally added with
      a comment explaining that "copying the argument structures in
      HYPERVISOR_event_channel_op() and HYPERVISOR_physdev_op() into the local
      variable is sufficiently safe" and only made sure to not write
      past the end of the argument structure, the checks in linux/string.h
      disagree with that, when link-time optimizations are used:
      
      In function 'memcpy',
          inlined from 'pirq_query_unmask' at drivers/xen/fallback.c:53:2,
          inlined from '__startup_pirq' at drivers/xen/events/events_base.c:529:2,
          inlined from 'restore_pirqs' at drivers/xen/events/events_base.c:1439:3,
          inlined from 'xen_irq_resume' at drivers/xen/events/events_base.c:1581:2:
      include/linux/string.h:350:3: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
         __read_overflow2();
         ^
      
      Further research turned out that only Xen 3.0.2 or earlier required the
      fallback at all, while all versions in use today don't need it.
      As far as I can tell, it is not even possible to run a mainline kernel
      on those old Xen releases, at the time when they were in use, only
      a patched kernel was supported anyway.
      
      Fixes: cf47a83f ("xen/hypercall: fix hypercall fallback code for very old hypervisors")
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      b1ddd406
    • L
      get rid of legacy 'get_ds()' function · 736706be
      Linus Torvalds 提交于
      Every in-kernel use of this function defined it to KERNEL_DS (either as
      an actual define, or as an inline function).  It's an entirely
      historical artifact, and long long long ago used to actually read the
      segment selector valueof '%ds' on x86.
      
      Which in the kernel is always KERNEL_DS.
      
      Inspired by a patch from Jann Horn that just did this for a very small
      subset of users (the ones in fs/), along with Al who suggested a script.
      I then just took it to the logical extreme and removed all the remaining
      gunk.
      
      Roughly scripted with
      
         git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
         git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'
      
      plus manual fixups to remove a few unusual usage patterns, the couple of
      inline function cases and to fix up a comment that had become stale.
      
      The 'get_ds()' function remains in an x86 kvm selftest, since in user
      space it actually does something relevant.
      Inspired-by: NJann Horn <jannh@google.com>
      Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      736706be
  9. 28 2月, 2019 1 次提交
  10. 26 2月, 2019 2 次提交
  11. 24 2月, 2019 1 次提交
  12. 23 2月, 2019 2 次提交
    • Y
      KVM: MMU: record maximum physical address width in kvm_mmu_extended_role · de3ccd26
      Yu Zhang 提交于
      Previously, commit 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow
      MMU reconfiguration is needed") offered some optimization to avoid
      the unnecessary reconfiguration. Yet one scenario is broken - when
      cpuid changes VM's maximum physical address width, reconfiguration
      is needed to reset the reserved bits.  Also, the TDP may need to
      reset its shadow_root_level when this value is changed.
      
      To fix this, a new field, maxphyaddr, is introduced in the extended
      role structure to keep track of the configured guest physical address
      width.
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de3ccd26
    • V
      x86/kvm/mmu: fix switch between root and guest MMUs · ad7dc69a
      Vitaly Kuznetsov 提交于
      Commit 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu") brought one subtle
      change: previously, when switching back from L2 to L1, we were resetting
      MMU hooks (like mmu->get_cr3()) in kvm_init_mmu() called from
      nested_vmx_load_cr3() and now we do that in nested_ept_uninit_mmu_context()
      when we re-target vcpu->arch.mmu pointer.
      The change itself looks logical: if nested_ept_init_mmu_context() changes
      something than nested_ept_uninit_mmu_context() restores it back. There is,
      however, one thing: the following call chain:
      
       nested_vmx_load_cr3()
        kvm_mmu_new_cr3()
          __kvm_mmu_new_cr3()
            fast_cr3_switch()
              cached_root_available()
      
      now happens with MMU hooks pointing to the new MMU (root MMU in our case)
      while previously it was happening with the old one. cached_root_available()
      tries to stash current root but it is incorrect to read current CR3 with
      mmu->get_cr3(), we need to use old_mmu->get_cr3() which in case we're
      switching from L2 to L1 is guest_mmu. (BTW, in shadow page tables case this
      is a non-issue because we don't switch MMU).
      
      While we could've tried to guess that we're switching between MMUs and call
      the right ->get_cr3() from cached_root_available() this seems to be overly
      complicated. Instead, just stash the corresponding CR3 when setting
      root_hpa and make cached_root_available() use the stashed value.
      
      Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad7dc69a
  13. 21 2月, 2019 6 次提交
  14. 15 2月, 2019 4 次提交
  15. 14 2月, 2019 1 次提交