1. 04 6月, 2015 5 次提交
  2. 29 5月, 2015 3 次提交
    • M
      KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR · b7e60c5a
      Marcelo Tosatti 提交于
      Initialize kvmclock base, on kvmclock system MSR write time,
      so that the guest sees kvmclock counting from zero.
      
      This matches baremetal behaviour when kvmclock in guest
      sets sched clock stable.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      [Remove unnecessary comment. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7e60c5a
    • L
      x86: kvmclock: set scheduler clock stable · 0ad83caa
      Luiz Capitulino 提交于
      If you try to enable NOHZ_FULL on a guest today, you'll get
      the following error when the guest tries to deactivate the
      scheduler tick:
      
       WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
       NO_HZ FULL will not work with unstable sched clock
       CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb7 #204
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       Workqueue: events flush_to_ldisc
        ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
        ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
        0000000000000000 0000000000000003 0000000000000001 0000000000000001
       Call Trace:
        <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
        [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
        [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
        [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
        [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
        [<ffffffff810511c5>] irq_exit+0xc5/0x130
        [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
        [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
        <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
        [<ffffffff8108bbc8>] __wake_up+0x48/0x60
        [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
        [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
        [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
        [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
        [<ffffffff81064d05>] process_one_work+0x1d5/0x540
        [<ffffffff81064c81>] ? process_one_work+0x151/0x540
        [<ffffffff81065191>] worker_thread+0x121/0x470
        [<ffffffff81065070>] ? process_one_work+0x540/0x540
        [<ffffffff8106b4df>] kthread+0xef/0x110
        [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
        [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
        [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
       ---[ end trace 06e3507544a38866 ]---
      
      However, it turns out that kvmclock does provide a stable
      sched_clock callback. So, let the scheduler know this which
      in turn makes NOHZ_FULL work in the guest.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0ad83caa
    • M
      x86: kvmclock: add flag to indicate pvclock counts from zero · 61191725
      Marcelo Tosatti 提交于
      Setting sched clock stable for kvmclock causes the printk timestamps
      to not start from zero, which is different from baremetal and
      can possibly break userspace. Add a flag to indicate that
      hypervisor sets clock base at zero when kvmclock is initialized.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61191725
  3. 28 5月, 2015 8 次提交
  4. 26 5月, 2015 3 次提交
  5. 20 5月, 2015 19 次提交
    • L
      kvm/fpu: Enable eager restore kvm FPU for MPX · c447e76b
      Liang Li 提交于
      The MPX feature requires eager KVM FPU restore support. We have verified
      that MPX cannot work correctly with the current lazy KVM FPU restore
      mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
      exposed to VM.
      Signed-off-by: NYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: NLiang Li <liang.z.li@intel.com>
      [Also activate the FPU on AMD processors. - Paolo]
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c447e76b
    • P
      Revert "KVM: x86: drop fpu_activate hook" · 0fdd74f7
      Paolo Bonzini 提交于
      This reverts commit 4473b570.  We'll
      use the hook again.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0fdd74f7
    • A
      kvm: fix crash in kvm_vcpu_reload_apic_access_page · e8fd5e9e
      Andrea Arcangeli 提交于
      memslot->userfault_addr is set by the kernel with a mmap executed
      from the kernel but the userland can still munmap it and lead to the
      below oops after memslot->userfault_addr points to a host virtual
      address that has no vma or mapping.
      
      [  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
      [  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
      [  327.538529] Oops: 0000 [#1] SMP
      [  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
      [  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
      [  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
      [  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
      [  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
      [  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
      [  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
      [  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
      [  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
      [  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
      [  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
      [  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
      [  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
      [  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  327.540974] Stack:
      [  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
      [  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
      [  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
      [  327.541261] Call Trace:
      [  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
      [  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
      [  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
      [  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
      [  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
      [  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
      [  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
      [  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
      [  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
      [  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
      [  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
      [  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8fd5e9e
    • N
      kvm: x86: Make functions that have no external callers static · ed3cf152
      Nicholas Krause 提交于
      This makes the functions kvm_guest_cpu_init and  kvm_init_debugfs
      static now due to having no external callers outside their
      declarations in the file, kvm.c.
      Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed3cf152
    • P
      KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async · 3520469d
      Paolo Bonzini 提交于
      gfn_to_pfn_async is used in just one place, and because of x86-specific
      treatment that place will need to look at the memory slot.  Hence inline
      it into try_async_pf and export __gfn_to_pfn_memslot.
      
      The patch also switches the subsequent call to gfn_to_pfn_prot to use
      __gfn_to_pfn_memslot.  This is a small optimization.  Finally, remove
      the now-unused async argument of __gfn_to_pfn.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3520469d
    • X
      KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed · d81135a5
      Xiao Guangrong 提交于
      CR0.CD and CR0.NW are not used by shadow page table so that need
      not adjust mmu if these two bit are changed
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d81135a5
    • X
      KVM: MMU: fix MTRR update · efdfe536
      Xiao Guangrong 提交于
      Currently, whenever guest MTRR registers are changed
      kvm_mmu_reset_context is called to switch to the new root shadow page
      table, however, it's useless since:
      1) the cache type is not cached into shadow page's attribute so that
         the original root shadow page will be reused
      
      2) the cache type is set on the last spte, that means we should sync
         the last sptes when MTRR is changed
      
      This patch fixs this issue by drop all the spte in the gfn range which
      is being updated by MTRR
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      efdfe536
    • X
      KVM: MMU: fix decoding cache type from MTRR · d69afbc6
      Xiao Guangrong 提交于
      There are some bugs in current get_mtrr_type();
      1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
         IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
         that means other bits are ignored if it is cleared
      
      2: the fixed MTRR ranges are controlled by bit 0 of
         mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
      
      3: if MTRR is disabled, UC is applied to all of physical memory rather
         than mtrr_state->def_type
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d69afbc6
    • X
      KVM: MMU: introduce kvm_zap_rmapp · 6a49f85c
      Xiao Guangrong 提交于
      Split kvm_unmap_rmapp and introduce kvm_zap_rmapp which will be used in the
      later patch
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6a49f85c
    • X
      KVM: MMU: use slot_handle_level and its helper to clean up the code · d77aa73c
      Xiao Guangrong 提交于
      slot_handle_level and its helper functions are ready now, use them to
      clean up the code
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d77aa73c
    • X
      KVM: MMU: introduce slot_handle_level_range() and its helpers · 1bad2b2a
      Xiao Guangrong 提交于
      There are several places walking all rmaps for the memslot so that
      introduce common functions to cleanup the code
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1bad2b2a
    • X
      KVM: MMU: introduce for_each_slot_rmap_range · 6ce1f4e2
      Xiao Guangrong 提交于
      It's used to abstract the code from kvm_handle_hva_range and it will be
      used by later patch
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ce1f4e2
    • X
    • X
      KVM: MMU: introduce for_each_rmap_spte() · 0d536790
      Xiao Guangrong 提交于
      It's used to walk all the sptes on the rmap to clean up the
      code
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0d536790
    • P
      Revert "kvmclock: set scheduler clock stable" · c35ebbea
      Paolo Bonzini 提交于
      This reverts commit ff7bbb9c.
      Sasha Levin is seeing odd jump in time values during boot of a KVM guest:
      
      [...]
      [    0.000000] tsc: Detected 2260.998 MHz processor
      [3376355.247558] Calibrating delay loop (skipped) preset value..
      [...]
      
      and bisected them to this commit.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c35ebbea
    • X
      KVM: MMU: fix SMAP virtualization · edc90b7d
      Xiao Guangrong 提交于
      KVM may turn a user page to a kernel page when kernel writes a readonly
      user page if CR0.WP = 1. This shadow page entry will be reused after
      SMAP is enabled so that kernel is allowed to access this user page
      
      Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
      once CR4.SMAP is updated
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      edc90b7d
    • N
      KVM: x86: Fix zero iterations REP-string · 428e3d08
      Nadav Amit 提交于
      When a REP-string is executed in 64-bit mode with an address-size prefix,
      ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel
      CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits
      of the pointers in MOVS/STOS. This behavior is specific to Intel according to
      few experiments.
      
      As one may guess, this is an undocumented behavior. Yet, it is observable in
      the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that
      VMware appears to get it right.  The behavior can be observed using the
      following code:
      
       #include <stdio.h>
      
       #define LOW_MASK	(0xffffffff00000000ull)
       #define ALL_MASK	(0xffffffffffffffffull)
       #define TEST(opcode)							\
      	do {								\
      	asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \
      			: "=S"(s), "=c"(c), "=D"(d) 			\
      			: "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK));	\
      	printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n",		\
      		opcode, c, s, d);					\
      	} while(0)
      
      void main()
      {
      	unsigned long long s, d, c;
      	iopl(3);
      	TEST("0x6c");
      	TEST("0x6d");
      	TEST("0x6e");
      	TEST("0x6f");
      	TEST("0xa4");
      	TEST("0xa5");
      	TEST("0xa6");
      	TEST("0xa7");
      	TEST("0xaa");
      	TEST("0xab");
      	TEST("0xae");
      	TEST("0xaf");
      }
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      428e3d08
    • N
      KVM: x86: Fix update RCX/RDI/RSI on REP-string · ee122a71
      Nadav Amit 提交于
      When REP-string instruction is preceded with an address-size prefix,
      ECX/EDI/ESI are used as the operation counter and pointers.  When they are
      updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they
      are updated on every 32-bit register operation.  Fix it.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee122a71
    • N
      KVM: x86: Fix DR7 mask on task-switch while debugging · 3db176d5
      Nadav Amit 提交于
      If the host sets hardware breakpoints to debug the guest, and a task-switch
      occurs in the guest, the architectural DR7 will not be updated. The effective
      DR7 would be updated instead.
      
      This fix puts the DR7 update during task-switch emulation, so it now uses the
      standard DR setting mechanism instead of the one that was previously used. As a
      bonus, the update of DR7 will now be effective for AMD as well.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3db176d5
  6. 11 5月, 2015 2 次提交