1. 24 8月, 2017 2 次提交
  2. 18 8月, 2017 6 次提交
  3. 12 8月, 2017 4 次提交
    • J
      kvm: x86: Disallow illegal IA32_APIC_BASE MSR values · d3802286
      Jim Mattson 提交于
      Host-initiated writes to the IA32_APIC_BASE MSR do not have to follow
      local APIC state transition constraints, but the value written must be
      valid.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d3802286
    • W
      KVM: MMU: Bail out immediately if there is no available mmu page · 26eeb53c
      Wanpeng Li 提交于
      Bailing out immediately if there is no available mmu page to alloc.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      26eeb53c
    • W
      KVM: MMU: Fix softlockup due to mmu_lock is held too long · 42bcbebf
      Wanpeng Li 提交于
      watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089]
       irq event stamp: 20532
       hardirqs last  enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d
       hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0
       softirqs last  enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1
       softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100
       CPU: 5 PID: 3089 Comm: warn_test Tainted: G           OE   4.13.0-rc3+ #8
       RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm]
       Call Trace:
        make_mmu_pages_available.isra.120+0x71/0xc0 [kvm]
        kvm_mmu_load+0x1cf/0x410 [kvm]
        kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
        ? __this_cpu_preempt_check+0x13/0x20
      
      This can be reproduced readily by ept=N and running syzkaller tests since
      many syzkaller testcases don't setup any memory regions. However, if ept=Y
      rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will
      extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES
      which just hide the issue.
      
      I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1,
      so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails
      to zap any pages, however prepare_zap_oldest_mmu_page() always returns true.
      It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock
      softlockup.
      
      This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page()
      according to whether or not there is mmu page zapped.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      42bcbebf
    • D
      KVM: nVMX: validate eptp pointer · a057e0e2
      David Hildenbrand 提交于
      Let's reuse the function introduced with eptp switching.
      
      We don't explicitly have to check against enable_ept_ad_bits, as this
      is implicitly done when checking against nested_vmx_ept_caps in
      valid_ept_address().
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a057e0e2
  4. 10 8月, 2017 3 次提交
    • P
      kvm: nVMX: Add support for fast unprotection of nested guest page tables · eebed243
      Paolo Bonzini 提交于
      This is the same as commit 14727754 ("kvm: svm: Add support for
      additional SVM NPF error codes", 2016-11-23), but for Intel processors.
      In this case, the exit qualification field's bit 8 says whether the
      EPT violation occurred while translating the guest's final physical
      address or rather while translating the guest page tables.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eebed243
    • B
      KVM: SVM: Limit PFERR_NESTED_GUEST_PAGE error_code check to L1 guest · 64531a3b
      Brijesh Singh 提交于
      Commit 14727754 ("kvm: svm: Add support for additional SVM NPF error
      codes", 2016-11-23) added a new error code to aid nested page fault
      handling.  The commit unprotects (kvm_mmu_unprotect_page) the page when
      we get a NPF due to guest page table walk where the page was marked RO.
      
      However, if an L0->L2 shadow nested page table can also be marked read-only
      when a page is read only in L1's nested page table.  If such a page
      is accessed by L2 while walking page tables it can cause a nested
      page fault (page table walks are write accesses).  However, after
      kvm_mmu_unprotect_page we may get another page fault, and again in an
      endless stream.
      
      To cover this use case, we qualify the new error_code check with
      vcpu->arch.mmu_direct_map so that the error_code check would run on L1
      guest, and not the L2 guest.  This avoids hitting the above scenario.
      
      Fixes: 14727754
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      64531a3b
    • W
      KVM: X86: Fix residual mmio emulation request to userspace · bbeac283
      Wanpeng Li 提交于
      Reported by syzkaller:
      
      The kvm-intel.unrestricted_guest=0
      
         WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         Call Trace:
          ? put_pid+0x3a/0x50
          ? rcu_read_lock_sched_held+0x79/0x80
          ? kmem_cache_free+0x2f2/0x350
          kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? __fget+0xfc/0x210
          do_vfs_ioctl+0xa4/0x6a0
          ? __fget+0x11d/0x210
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
          ? __this_cpu_preempt_check+0x13/0x20
      
      The syszkaller folks reported a residual mmio emulation request to userspace
      due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
      incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
      and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
      several threads to launch the same vCPU, the thread which lauch this vCPU after
      the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
      trigger the warning.
      
         #define _GNU_SOURCE
         #include <pthread.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/wait.h>
         #include <sys/types.h>
         #include <sys/stat.h>
         #include <sys/mman.h>
         #include <fcntl.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         #include <stdio.h>
      
         int kvmcpu;
         struct kvm_run *run;
      
         void* thr(void* arg)
         {
           int res;
           res = ioctl(kvmcpu, KVM_RUN, 0);
           printf("ret1=%d exit_reason=%d suberror=%d\n",
               res, run->exit_reason, run->internal.suberror);
           return 0;
         }
      
         void test()
         {
           int i, kvm, kvmvm;
           pthread_t th[4];
      
           kvm = open("/dev/kvm", O_RDWR);
           kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
           kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
           run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
           srand(getpid());
           for (i = 0; i < 4; i++) {
             pthread_create(&th[i], 0, thr, 0);
             usleep(rand() % 10000);
           }
           for (i = 0; i < 4; i++)
             pthread_join(th[i], 0);
         }
      
         int main()
         {
           for (;;) {
             int pid = fork();
             if (pid < 0)
               exit(1);
             if (pid == 0) {
               test();
               exit(0);
             }
             int status;
             while (waitpid(pid, &status, __WALL) != pid) {}
           }
           return 0;
         }
      
      This patch fixes it by resetting the vcpu->mmio_needed once we receive
      the triple fault to avoid the residue.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bbeac283
  5. 08 8月, 2017 2 次提交
  6. 07 8月, 2017 10 次提交
  7. 03 8月, 2017 6 次提交
    • W
      KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" · 6550c4df
      Wanpeng Li 提交于
      ------------[ cut here ]------------
       WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
       CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
       RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
      Call Trace:
        vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
        ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
        ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x8f/0x750
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
      means that tells the kernel to not make use of any IOAPICs that may be present
      in the system.
      
      Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
      variable passed from vcpu_enter_guest() which means that the L0's userspace
      requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
      L0's userspace reqeusts an irq window) is true, so there is no interrupt which
      L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
      interrupt on exit" for the irq window requirement in this scenario.
      
      This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
      if there is no L1 requirement to inject an interrupt to L2.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Added code comment to make it obvious that the behavior is not correct.
       We should do a userspace exit with open interrupt window instead of the
       nested VM exit.  This patch still improves the behavior, so it was
       accepted as a (temporary) workaround.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6550c4df
    • D
      KVM: nVMX: mark vmcs12 pages dirty on L2 exit · c9f04407
      David Matlack 提交于
      The host physical addresses of L1's Virtual APIC Page and Posted
      Interrupt descriptor are loaded into the VMCS02. The CPU may write
      to these pages via their host physical address while L2 is running,
      bypassing address-translation-based dirty tracking (e.g. EPT write
      protection). Mark them dirty on every exit from L2 to prevent them
      from getting out of sync with dirty tracking.
      
      Also mark the virtual APIC page and the posted interrupt descriptor
      dirty when KVM is virtualizing posted interrupt processing.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c9f04407
    • D
      kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown · 8ca44e88
      David Matlack 提交于
      According to the Intel SDM, software cannot rely on the current VMCS to be
      coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
      flushes.
      
      24.11.1 Software Use of Virtual-Machine Control Structures
      ...
        If a logical processor leaves VMX operation, any VMCSs active on
        that logical processor may be corrupted (see below). To prevent
        such corruption of a VMCS that may be used either after a return
        to VMX operation or on another logical processor, software should
        execute VMCLEAR for that VMCS before executing the VMXOFF instruction
        or removing power from the processor (e.g., as part of a transition
        to the S3 and S4 power states).
      ...
      
      This fixes a "suspicious rcu_dereference_check() usage!" warning during
      kvm_vm_release() because nested_release_vmcs12() calls
      kvm_vcpu_write_guest_page() without holding kvm->srcu.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      8ca44e88
    • P
      KVM: nVMX: do not pin the VMCS12 · 9f744c59
      Paolo Bonzini 提交于
      Since the current implementation of VMCS12 does a memcpy in and out
      of guest memory, we do not need current_vmcs12 and current_vmcs12_page
      anymore.  current_vmptr is enough to read and write the VMCS12.
      
      And David Matlack noted:
      
        This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
        VMCS12 page by using kvm_write_guest. nested_release_page() only marks
        the struct page dirty.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      [Added David Matlack's note and nested_release_page_clean() fix.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      9f744c59
    • L
      KVM: X86: init irq->level in kvm_pv_kick_cpu_op · ebd28fcb
      Longpeng(Mike) 提交于
      'lapic_irq' is a local variable and its 'level' field isn't
      initialized, so 'level' is random, it doesn't matter but
      makes UBSAN unhappy:
      
      UBSAN: Undefined behaviour in .../lapic.c:...
      load of value 10 is not a valid value for type '_Bool'
      ...
      Call Trace:
       [<ffffffff81f030b6>] dump_stack+0x1e/0x20
       [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55
       [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162
       [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm]
       [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm]
       [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm]
       [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm]
       [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel]
       [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel]
      ...
      Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ebd28fcb
    • W
      KVM: X86: Fix loss of pending INIT due to race · f4ef1910
      Wanpeng Li 提交于
      When SMP VM start, AP may lost INIT because of receiving INIT between
      kvm_vcpu_ioctl_x86_get/set_vcpu_events.
      
             vcpu 0                             vcpu 1
                                         kvm_vcpu_ioctl_x86_get_vcpu_events
                                           events->smi.latched_init = 0
        send INIT to vcpu1
          set vcpu1's pending_events
                                         kvm_vcpu_ioctl_x86_set_vcpu_events
                                            if (events->smi.latched_init == 0)
                                              clear INIT in pending_events
      
      This patch fixes it by just update SMM related flags if we are in SMM.
      
      Thanks Peng Hao for the report and original commit message.
      Reported-by: NPeng Hao <peng.hao2@zte.com.cn>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f4ef1910
  8. 02 8月, 2017 3 次提交
    • W
      KVM: async_pf: make rcu irq exit if not triggered from idle task · 337c017c
      Wanpeng Li 提交于
       WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
       CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1
       RIP: 0010:rcu_note_context_switch+0x207/0x6b0
       Call Trace:
        __schedule+0xda/0xba0
        ? kvm_async_pf_task_wait+0x1b2/0x270
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
       RIP: 0010:__d_lookup_rcu+0x90/0x1e0
      
      I encounter this when trying to stress the async page fault in L1 guest w/
      L2 guests running.
      
      Commit 9b132fbe (Add rcu user eqs exception hooks for async page
      fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
      idle eqs when needed, to protect the code that needs use rcu.  However,
      we need to call the pair even if the function calls schedule(), as seen
      from the above backtrace.
      
      This patch fixes it by informing the RCU subsystem exit/enter the irq
      towards/away from idle for both n.halted and !n.halted.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      337c017c
    • P
      KVM: nVMX: fixes to nested virt interrupt injection · b96fb439
      Paolo Bonzini 提交于
      There are three issues in nested_vmx_check_exception:
      
      1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
      by Wanpeng Li;
      
      2) it should rebuild the interruption info and exit qualification fields
      from scratch, as reported by Jim Mattson, because the values from the
      L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
      a page fault, the EPT misconfig's exit qualification is incorrect).
      
      3) CR2 and DR6 should not be written for exception intercept vmexits
      (CR2 only for AMD).
      
      This patch fixes the first two and adds a comment about the last,
      outlining the fix.
      
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b96fb439
    • P
      KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 · 7313c698
      Paolo Bonzini 提交于
      Do this in the caller of nested_vmx_vmexit instead.
      
      nested_vmx_check_exception was doing a vmwrite to the vmcs02's
      VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
      the field to vmcs12->vm_exit_intr_error_code.  However that isn't
      possible on pre-Haswell machines.  Moving the vmcs12 write to the
      callers fixes it.
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
       thanks to fengguang.wu@intel.com]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      7313c698
  9. 01 8月, 2017 1 次提交
    • T
      x86/hpet: Cure interface abuse in the resume path · bb68cfe2
      Thomas Gleixner 提交于
      The HPET resume path abuses irq_domain_[de]activate_irq() to restore the
      MSI message in the HPET chip for the boot CPU on resume and it relies on an
      implementation detail of the interrupt core code, which magically makes the
      HPET unmask call invoked via a irq_disable/enable pair. This worked as long
      as the irq code did unconditionally invoke the unmask() callback. With the
      recent changes which keep track of the masked state to avoid expensive
      hardware access, this does not longer work. As a consequence the HPET timer
      interrupts are not unmasked which breaks resume as the boot CPU waits
      forever that a timer interrupt arrives.
      
      Make the restore of the MSI message explicit and invoke the unmask()
      function directly. While at it get rid of the pointless affinity setting as
      nothing can change the affinity of the interrupt and the vector across
      suspend/resume. The restore of the MSI message reestablishes the previous
      affinity setting which is the correct one.
      
      Fixes: bf22ff45 ("genirq: Avoid unnecessary low level irq function calls")
      Reported-and-tested-by: NTomi Sarvela <tomi.p.sarvela@intel.com>
      Reported-by: NMartin Peres <martin.peres@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: N"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: jeffy.chen@rock-chips.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos
      bb68cfe2
  10. 30 7月, 2017 1 次提交
    • R
      cpufreq: x86: Make scaling_cur_freq behave more as expected · 4815d3c5
      Rafael J. Wysocki 提交于
      After commit f8475cef "x86: use common aperfmperf_khz_on_cpu() to
      calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
      in sysfs only behaves as expected on x86 with APERF/MPERF registers
      available when it is read from at least twice in a row.  The value
      returned by the first read may not be meaningful, because the
      computations in there use cached values from the previous iteration
      of aperfmperf_snapshot_khz() which may be stale.
      
      To prevent that from happening, modify arch_freq_get_on_cpu() to
      call aperfmperf_snapshot_khz() twice, with a short delay between
      these calls, if the previous invocation of aperfmperf_snapshot_khz()
      was too far back in the past (specifically, more that 1s ago).
      
      Also, as pointed out by Doug Smythies, aperf_delta is limited now
      and the multiplication of it by cpu_khz won't overflow, so simplify
      the s->khz computations too.
      
      Fixes: f8475cef "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF"
      Reported-by: NDoug Smythies <dsmythies@telus.net>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4815d3c5
  11. 28 7月, 2017 1 次提交
    • M
      x86/boot: Disable the address-of-packed-member compiler warning · 20c6c189
      Matthias Kaehlcke 提交于
      The clang warning 'address-of-packed-member' is disabled for the general
      kernel code, also disable it for the x86 boot code.
      
      This suppresses a bunch of warnings like this when building with clang:
      
      ./arch/x86/include/asm/processor.h:535:30: warning: taking address of
        packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
        unaligned pointer value [-Waddress-of-packed-member]
          return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                      ^~~~~~~~~~~~~~~~~~~
      ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
        'this_cpu_read_stable'
          #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                          ^~~
      ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
        'percpu_stable_op'
          : "p" (&(var)));
                   ^~~
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20c6c189
  12. 27 7月, 2017 1 次提交
    • W
      KVM: LAPIC: Fix reentrancy issues with preempt notifiers · 1d518c68
      Wanpeng Li 提交于
      Preempt can occur in the preemption timer expiration handler:
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            hv_timer_is_use == true
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   kvm_lapic_restart_hv_timer
                                     restart_apic_timer
                                       start_hv_timer
                                         already-expired timer or sw timer triggerd in the window
                                       start_sw_timer
                                         cancel_hv_timer
                                 /* back in kvm_lapic_expired_hv_timer */
                                 cancel_hv_timer
                                   WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
      
      This can be reproduced if CONFIG_PREEMPT is enabled.
      
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ #16
       RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
      Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
       ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ #16
       RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
      Call Trace:
        kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
      and start_sw_timer be in preemption-disabled regions, which trivially
      avoid any reentrancy issue with preempt notifier.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Add more WARNs. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d518c68