1. 14 12月, 2017 4 次提交
    • P
      KVM: x86: add support for UMIP · ae3e61e1
      Paolo Bonzini 提交于
      Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and
      add UMIP support for instructions that are already emulated by KVM.
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ae3e61e1
    • P
      kvm: x86: fix WARN due to uninitialized guest FPU state · 5663d8f9
      Peter Xu 提交于
      ------------[ cut here ]------------
       Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
       WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
       CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G    B      OE    4.15.0-rc2+ #10
       RIP: 0010:ex_handler_fprestore+0x88/0x90
       Call Trace:
        fixup_exception+0x4e/0x60
        do_general_protection+0xff/0x270
        general_protection+0x22/0x30
       RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
       RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
        kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
        kvm_apic_accept_events+0x1c0/0x240 [kvm]
        kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
        kvm_vcpu_ioctl+0x479/0x880 [kvm]
        do_vfs_ioctl+0x142/0x9a0
        SyS_ioctl+0x74/0x80
        do_syscall_64+0x15f/0x600
      
      where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
      To fix it, move kvm_load_guest_fpu to the very beginning of
      kvm_arch_vcpu_ioctl_run.
      
      Cc: stable@vger.kernel.org
      Fixes: f775b13eSigned-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5663d8f9
    • W
      KVM: X86: Fix load RFLAGS w/o the fixed bit · d73235d1
      Wanpeng Li 提交于
       *** Guest State ***
       CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
       CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
       CR3 = 0x00000000fffbc000
       RSP = 0x0000000000000000  RIP = 0x0000000000000000
       RFLAGS=0x00000000         DR7 = 0x0000000000000400
              ^^^^^^^^^^
      
      The failed vmentry is triggered by the following testcase when ept=Y:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[5];
          int main()
          {
          	r[2] = open("/dev/kvm", O_RDONLY);
          	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
          	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
          	struct kvm_regs regs = {
          		.rflags = 0,
          	};
          	ioctl(r[4], KVM_SET_REGS, &regs);
          	ioctl(r[4], KVM_RUN, 0);
          }
      
      X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
      of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
      This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.
      
      Cc: stable@vger.kernel.org
      Suggested-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NQuan Xu <quan.xu0@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d73235d1
    • W
      KVM: MMU: Fix infinite loop when there is no available mmu page · ed52870f
      Wanpeng Li 提交于
      The below test case can cause infinite loop in kvm when ept=0.
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[5];
          int main()
          {
          	r[2] = open("/dev/kvm", O_RDONLY);
          	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
          	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
          	ioctl(r[4], KVM_RUN, 0);
          }
      
      It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
      kvm return 1 when kvm fails to allocate root page table which can result
      in beblow infinite loop:
      
          vcpu_run() {
          	for (;;) {
      	    	r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
      	    	if (r <= 0)
      	    		break;
      	    	if (need_resched())
      	    		cond_resched();
            }
          }
      
      This patch fixes it by returning -ENOSPC when there is no available kvm mmu
      page for root page table.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 26eeb53c (KVM: MMU: Bail out immediately if there is no available mmu page)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed52870f
  2. 09 12月, 2017 1 次提交
  3. 07 12月, 2017 2 次提交
    • A
      x86/vdso: Change time() prototype to match __vdso_time() · 88edb57d
      Arnd Bergmann 提交于
      gcc-8 warns that time() is an alias for __vdso_time() but the two
      have different prototypes:
      
        arch/x86/entry/vdso/vclock_gettime.c:327:5: error: 'time' alias between functions of incompatible types 'int(time_t *)' {aka 'int(long int *)'} and 'time_t(time_t *)' {aka 'long int(long int *)'} [-Werror=attribute-alias]
         int time(time_t *t)
             ^~~~
        arch/x86/entry/vdso/vclock_gettime.c:318:16: note: aliased declaration here
      
      I could not figure out whether this is intentional, but I see that
      changing it to return time_t avoids the warning.
      
      Returning 'int' from time() is also a bit questionable, as it causes an
      overflow in y2038 even on 64-bit architectures that use a 64-bit time_t
      type. On 32-bit architecture with 64-bit time_t, time() should always
      be implement by the C library by calling a (to be added) clock_gettime()
      variant that takes a sufficiently wide argument.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Link: http://lkml.kernel.org/r/20171204150203.852959-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88edb57d
    • C
      x86: Fix Sparse warnings about non-static functions · d553d03f
      Colin Ian King 提交于
      Functions x86_vector_debug_show(), uv_handle_nmi() and uv_nmi_setup_common()
      are local to the source and do not need to be in global scope, so make them
      static.
      
      Fixes up various sparse warnings.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NMike Travis <mike.travis@hpe.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <trivial@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Cc: travis@sgi.com
      Link: http://lkml.kernel.org/r/20171206173358.24388-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d553d03f
  4. 06 12月, 2017 11 次提交
  5. 05 12月, 2017 1 次提交
    • H
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner 提交于
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      Reported-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
  6. 30 11月, 2017 2 次提交
  7. 28 11月, 2017 8 次提交
    • C
      x86/idt: Load idt early in start_secondary · 55d2d0ad
      Chunyu Hu 提交于
      On a secondary, idt is first loaded in cpu_init() with load_current_idt(),
      i.e. no exceptions can be handled before that point.
      
      The conversion of WARN() to use UD requires the IDT being loaded earlier as
      any warning between start_secondary() and load_curren_idt() in cpu_init()
      will result in an unhandled @UD exception and therefore fail the bringup of
      the CPU.
      
      Install the IDT handlers right in start_secondary() before calling cpu_init().
      
      [ tglx: Massaged changelog ]
      
      Fixes: 9a93848f ("x86/debug: Implement __WARN() using UD0")
      Signed-off-by: NChunyu Hu <chuhu@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: bp@alien8.de
      Cc: rostedt@goodmis.org
      Cc: luto@kernel.org
      Link: https://lkml.kernel.org/r/1511792499-4073-1-git-send-email-chuhu@redhat.com
      55d2d0ad
    • J
      x86/xen: Support early interrupts in xen pv guests · 42b3a4cb
      Juergen Gross 提交于
      Add early interrupt handlers activated by idt_setup_early_handler() to
      the handlers supported by Xen pv guests. This will allow for early
      WARN() calls not crashing the guest.
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Cc: boris.ostrovsky@oracle.com
      Link: https://lkml.kernel.org/r/20171124084221.30172-1-jgross@suse.com
      42b3a4cb
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c
    • W
      KVM: VMX: Fix vmx->nested freeing when no SMI handler · b7455825
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         ------------[ cut here ]------------
         WARNING: CPU: 5 PID: 2939 at arch/x86/kvm/vmx.c:3844 free_loaded_vmcs+0x77/0x80 [kvm_intel]
         CPU: 5 PID: 2939 Comm: repro Not tainted 4.14.0+ #26
         RIP: 0010:free_loaded_vmcs+0x77/0x80 [kvm_intel]
         Call Trace:
          vmx_free_vcpu+0xda/0x130 [kvm_intel]
          kvm_arch_destroy_vm+0x192/0x290 [kvm]
          kvm_put_kvm+0x262/0x560 [kvm]
          kvm_vm_release+0x2c/0x30 [kvm]
          __fput+0x190/0x370
          task_work_run+0xa1/0xd0
          do_exit+0x4d2/0x13e0
          do_group_exit+0x89/0x140
          get_signal+0x318/0xb80
          do_signal+0x8c/0xb40
          exit_to_usermode_loop+0xe4/0x140
          syscall_return_slowpath+0x206/0x230
          entry_SYSCALL_64_fastpath+0x98/0x9a
      
      The syzkaller testcase will execute VMXON/VMLAUCH instructions, so the
      vmx->nested stuff is populated, it will also issue KVM_SMI ioctl. However,
      the testcase is just a simple c program and not be lauched by something
      like seabios which implements smi_handler. Commit 05cade71 (KVM: nSVM:
      fix SMI injection in guest mode) gets out of guest mode and set nested.vmxon
      to false for the duration of SMM according to SDM 34.14.1 "leave VMX
      operation" upon entering SMM. We can't alloc/free the vmx->nested stuff
      each time when entering/exiting SMM since it will induce more overhead. So
      the function vmx_pre_enter_smm() marks nested.vmxon false even if vmx->nested
      stuff is still populated. What it expected is em_rsm() can mark nested.vmxon
      to be true again. However, the smi_handler/rsm will not execute since there
      is no something like seabios in this scenario. The function free_nested()
      fails to free the vmx->nested stuff since the vmx->nested.vmxon is false
      which results in the above warning.
      
      This patch fixes it by also considering the no SMI handler case, luckily
      vmx->nested.smm.vmxon is marked according to the value of vmx->nested.vmxon
      in vmx_pre_enter_smm(), we can take advantage of it and free vmx->nested
      stuff when L1 goes down.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Fixes: 05cade71 (KVM: nSVM: fix SMI injection in guest mode)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7455825
    • W
      KVM: VMX: Fix rflags cache during vCPU reset · c37c2873
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         *** Guest State ***
         CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
         CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1
         CR3 = 0x000000002081e000
         RSP = 0x000000000000fffa  RIP = 0x0000000000000000
         RFLAGS=0x00023000         DR7 = 0x00000000000000
                ^^^^^^^^^^
         ------------[ cut here ]------------
         WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
         CPU: 6 PID: 24431 Comm: reprotest Tainted: G        W  OE   4.14.0+ #26
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
         RSP: 0018:ffff880291d179e0 EFLAGS: 00010202
         Call Trace:
          kvm_vcpu_ioctl+0x479/0x880 [kvm]
          do_vfs_ioctl+0x142/0x9a0
          SyS_ioctl+0x74/0x80
          entry_SYSCALL_64_fastpath+0x23/0x9a
      
      The failed vmentry is triggered by the following beautified testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[5];
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
              struct kvm_guest_debug debug = {
                      .control = 0xf0403,
                      .arch = {
                              .debugreg[6] = 0x2,
                              .debugreg[7] = 0x2
                      }
              };
              ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug);
              ioctl(r[4], KVM_RUN, 0);
          }
      
      which testcase tries to setup the processor specific debug
      registers and configure vCPU for handling guest debug events through
      KVM_SET_GUEST_DEBUG.  The KVM_SET_GUEST_DEBUG ioctl will get and set
      rflags in order to set TF bit if single step is needed. All regs' caches
      are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU
      reset. However, the cache of rflags is not reset during vCPU reset. The
      function vmx_get_rflags() returns an unreset rflags cache value since
      the cache is marked avail, it is 0 after boot. Vmentry fails if the
      rflags reserved bit 1 is 0.
      
      This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and
      its cache to 0x2 during vCPU reset.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c37c2873
    • W
      KVM: X86: Fix softlockup when get the current kvmclock · e70b57a6
      Wanpeng Li 提交于
       watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
       CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
       RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
       Call Trace:
        get_time_ref_counter+0x5a/0x80 [kvm]
        kvm_hv_process_stimers+0x120/0x5f0 [kvm]
        kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
        kvm_vcpu_ioctl+0x33a/0x620 [kvm]
        do_vfs_ioctl+0xa1/0x5d0
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x1e/0xa9
      
      This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
      cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
      (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
      in kvm_get_time_scale() gets into an infinite loop.
      
      This patch fixes it by treating the unhotplug pCPU as not using master clock.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e70b57a6
    • D
      KVM: lapic: Fixup LDR on load in x2apic · 12806ba9
      Dr. David Alan Gilbert 提交于
      In x2apic mode the LDR is fixed based on the ID rather
      than separately loadable like it was before x2.
      When kvm_apic_set_state is called, the base is set, and if
      it has the X2APIC_ENABLE flag set then the LDR is calculated;
      however that value gets overwritten by the memcpy a few lines
      below overwriting it with the value that came from userland.
      
      The symptom is a lack of EOI after loading the state
      (e.g. after a QEMU migration) and is due to the EOI bitmap
      being wrong due to the incorrect LDR.  This was seen with
      a Win2016 guest under Qemu with irqchip=split whose USB mouse
      didn't work after a VM migration.
      
      This corresponds to RH bug:
        https://bugzilla.redhat.com/show_bug.cgi?id=1502591Reported-by: NYiqian Wei <yiwei@redhat.com>
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: stable@vger.kernel.org
      [Applied fixup from Liran Alon. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      12806ba9
    • D
      KVM: lapic: Split out x2apic ldr calculation · e872fa94
      Dr. David Alan Gilbert 提交于
      Split out the ldr calculation from kvm_apic_set_x2apic_id
      since we're about to reuse it in the following patch.
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e872fa94
  8. 25 11月, 2017 2 次提交
    • N
      x86/tlb: Disable interrupts when changing CR4 · 9d0b6232
      Nadav Amit 提交于
      CR4 modifications are implemented as RMW operations which update a shadow
      variable and write the result to CR4. The RMW operation is protected by
      preemption disable, but there is no enforcement or debugging mechanism.
      
      CR4 modifications happen also in interrupt context via
      __native_flush_tlb_global(). This implementation does not affect a
      interrupted thread context CR4 operation, because the CR4 toggle restores
      the original content and does not modify the shadow variable.
      
      So the current situation seems to be safe, but a recent patch tried to add
      an actual RMW operation in interrupt context, which will cause subtle
      corruptions.
      
      To prevent that and make the CR4 handling future proof:
      
       - Add a lockdep assertion to __cr4_set() which will catch interrupt
         enabled invocations
      
       - Disable interrupts in the cr4 manipulator inlines
      
       - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
         from __switch_to_xtra() where interrupts are already disabled and
         performance matters.
      
      All other call sites are not performance critical, so the extra overhead of
      an additional local_irq_save/restore() pair is not a problem. If new call
      sites care about performance then the necessary _irqsoff() variants can be
      added.
      
      [ tglx: Condensed the patch by moving the irq protection inside the
        	manipulator functions. Updated changelog ]
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Luck <tony.luck@intel.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: nadav.amit@gmail.com
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
      9d0b6232
    • N
      x86/tlb: Refactor CR4 setting and shadow write · 0c3292ca
      Nadav Amit 提交于
      Refactor the write to CR4 and its shadow value. This is done in
      preparation for the addition of an assertion to check that IRQs are
      disabled during CR4 update.
      
      No functional change.
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
      0c3292ca
  9. 24 11月, 2017 4 次提交
  10. 23 11月, 2017 1 次提交
    • A
      x86/entry/64: Add missing irqflags tracing to native_load_gs_index() · ca37e57b
      Andy Lutomirski 提交于
      Running this code with IRQs enabled (where dummy_lock is a spinlock):
      
      static void check_load_gs_index(void)
      {
      	/* This will fail. */
      	load_gs_index(0xffff);
      
      	spin_lock(&dummy_lock);
      	spin_unlock(&dummy_lock);
      }
      
      Will generate a lockdep warning.  The issue is that the actual write
      to %gs would cause an exception with IRQs disabled, and the exception
      handler would, as an inadvertent side effect, update irqflag tracing
      to reflect the IRQs-off status.  native_load_gs_index() would then
      turn IRQs back on and return with irqflag tracing still thinking that
      IRQs were off.  The dummy lock-and-unlock causes lockdep to notice the
      error and warn.
      
      Fix it by adding the missing tracing.
      
      Apparently nothing did this in a context where it mattered.  I haven't
      tried to find a code path that would actually exhibit the warning if
      appropriately nasty user code were running.
      
      I suspect that the security impact of this bug is very, very low --
      production systems don't run with lockdep enabled, and the warning is
      mostly harmless anyway.
      
      Found during a quick audit of the entry code to try to track down an
      unrelated bug that Ingo found in some still-in-development code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ca37e57b
  11. 22 11月, 2017 2 次提交
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin 提交于
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f68d62a5
    • A
      x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing · 548c3050
      Andy Lutomirski 提交于
      When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
      before it.  This means that users of entry_SYSCALL_64_after_hwframe()
      were responsible for invoking TRACE_IRQS_OFF, and the one and only
      user (Xen, added in the same commit) got it wrong.
      
      I think this would manifest as a warning if a Xen PV guest with
      CONFIG_DEBUG_LOCKDEP=y were used with context tracking.  (The
      context tracking bit is to cause lockdep to get invoked before we
      turn IRQs back on.)  I haven't tested that for real yet because I
      can't get a kernel configured like that to boot at all on Xen PV.
      
      Move TRACE_IRQS_OFF below the label.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries")
      Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      548c3050
  12. 21 11月, 2017 1 次提交
    • R
      x86/umip: Print a warning into the syslog if UMIP-protected instructions are used · fd11a649
      Ricardo Neri 提交于
      Print a rate-limited warning when a user-space program attempts to execute
      any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR
      and SMSW).
      
      This is useful, because when CONFIG_X86_INTEL_UMIP=y is selected and
      supported by the hardware, user space programs that try to execute such
      instructions will receive a SIGSEGV signal that they might not expect.
      
      In the specific cases for which emulation is provided (instructions SGDT,
      SIDT and SMSW in protected and virtual-8086 modes), no signal is
      generated. However, a warning is helpful to encourage updates in such
      programs to avoid the use of such instructions.
      
      Warnings are printed via a customized printk() function that also provides
      information about the program that attempted to use the affected
      instructions.
      
      Utility macros are defined to wrap umip_printk() for the error and warning
      kernel log levels.
      
      While here, replace an existing call to the generic rate-limited pr_err()
      with the new umip_pr_err().
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1511233476-17088-1-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd11a649
  13. 17 11月, 2017 1 次提交
    • P
      x86/smpboot: Fix __max_logical_packages estimate · b4c0a732
      Prarit Bhargava 提交于
      A system booted with a small number of cores enabled per package
      panics because the estimate of __max_logical_packages is too low.
      
      This occurs when the total number of active cores across all packages is
      less than the maximum core count for a single package. e.g.:
      
        On a 4 package system with 20 cores/package where only 4 cores are
        enabled on each package, the value of __max_logical_packages is
        calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
      
      Calculate __max_logical_packages after the cpu enumeration has completed.
      Use the boot cpu's data to extrapolate the number of packages.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: He Chen <he.chen@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Piotr Luc <piotr.luc@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
      b4c0a732