1. 11 3月, 2014 11 次提交
    • P
      KVM: svm: Allow the guest to run with dirty debug registers · facb0139
      Paolo Bonzini 提交于
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      facb0139
    • P
      KVM: svm: set/clear all DR intercepts in one swoop · 5315c716
      Paolo Bonzini 提交于
      Unlike other intercepts, debug register intercepts will be modified
      in hot paths if the guest OS is bad or otherwise gets tricked into
      doing so.
      
      Avoid calling recalc_intercepts 16 times for debug registers.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5315c716
    • P
      KVM: nVMX: Allow nested guests to run with dirty debug registers · d16c293e
      Paolo Bonzini 提交于
      When preparing the VMCS02, the CPU-based execution controls is computed
      by vmx_exec_control.  Turn off DR access exits there, too, if the
      KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d16c293e
    • P
      KVM: vmx: Allow the guest to run with dirty debug registers · 81908bf4
      Paolo Bonzini 提交于
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81908bf4
    • P
      KVM: x86: Allow the guest to run with dirty debug registers · c77fb5fe
      Paolo Bonzini 提交于
      When not running in guest-debug mode, the guest controls the debug
      registers and having to take an exit for each DR access is a waste
      of time.  If the guest gets into a state where each context switch
      causes DR to be saved and restored, this can take away as much as 40%
      of the execution time from the guest.
      
      After this patch, VMX- and SVM-specific code can set a flag in
      switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
      registers might be dirty and need to be reloaded (syncing will be taken
      care of by a new callback in kvm_x86_ops).  This flag can be set on the
      first access to a debug registers, so that multiple accesses to the
      debug registers only cause one vmexit.
      
      Note that since the guest will be able to read debug registers and
      enable breakpoints in DR7, we need to ensure that they are synchronized
      on entry to the guest---including DR6 that was not synced before.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c77fb5fe
    • P
      KVM: x86: change vcpu->arch.switch_db_regs to a bit mask · 360b948d
      Paolo Bonzini 提交于
      The next patch will add another bit that we can test with the
      same "if".
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      360b948d
    • P
      KVM: vmx: we do rely on loading DR7 on entry · c845f9c6
      Paolo Bonzini 提交于
      Currently, this works even if the bit is not in "min", because the bit is always
      set in MSR_IA32_VMX_ENTRY_CTLS.  Mention it for the sake of documentation, and
      to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c845f9c6
    • J
      KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f
      Jan Kiszka 提交于
      It's no longer possible to enter enable_irq_window in guest mode when
      L1 intercepts external interrupts and we are entering L2. This is now
      caught in vcpu_enter_guest. So we can remove the check from the VMX
      version of enable_irq_window, thus the need to return an error code from
      both enable_irq_window and enable_nmi_window.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9a7953f
    • J
      KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt · 220c5672
      Jan Kiszka 提交于
      According to SDM 27.2.3, IDT vectoring information will not be valid on
      vmexits caused by external NMIs. So we have to avoid creating such
      scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
      have a pending interrupt because that one would be migrated to L1's IDT
      vectoring info on nested exit.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      220c5672
    • J
      KVM: nVMX: Fully emulate preemption timer · f4124500
      Jan Kiszka 提交于
      We cannot rely on the hardware-provided preemption timer support because
      we are holding L2 in HLT outside non-root mode. Furthermore, emulating
      the preemption will resolve tick rate errata on older Intel CPUs.
      
      The emulation is based on hrtimer which is started on L2 entry, stopped
      on L2 exit and evaluated via the new check_nested_events hook. As we no
      longer rely on hardware features, we can enable both the preemption
      timer support and value saving unconditionally.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4124500
    • J
      KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145
      Jan Kiszka 提交于
      Move the check for leaving L2 on pending and intercepted IRQs or NMIs
      from the *_allowed handler into a dedicated callback. Invoke this
      callback at the relevant points before KVM checks if IRQs/NMIs can be
      injected. The callback has the task to switch from L2 to L1 if needed
      and inject the proper vmexit events.
      
      The rework fixes L2 wakeups from HLT and provides the foundation for
      preemption timer emulation.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6b8a145
  2. 04 3月, 2014 2 次提交
    • A
      x86: kvm: introduce periodic global clock updates · 332967a3
      Andrew Jones 提交于
      commit 0061d53d introduced a mechanism to execute a global clock
      update for a vm. We can apply this periodically in order to propagate
      host NTP corrections. Also, if all vcpus of a vm are pinned, then
      without an additional trigger, no guest NTP corrections can propagate
      either, as the current trigger is only vcpu cpu migration.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      332967a3
    • A
      x86: kvm: rate-limit global clock updates · 7e44e449
      Andrew Jones 提交于
      When we update a vcpu's local clock it may pick up an NTP correction.
      We can't wait an indeterminate amount of time for other vcpus to pick
      up that correction, so commit 0061d53d introduced a global clock
      update. However, we can't request a global clock update on every vcpu
      load either (which is what happens if the tsc is marked as unstable).
      The solution is to rate-limit the global clock updates. Marcelo
      calculated that we should delay the global clock updates no more
      than 0.1s as follows:
      
      Assume an NTP correction c is applied to one vcpu, but not the other,
      then in n seconds the delta of the vcpu system_timestamps will be
      c * n. If we assume a correction of 500ppm (worst-case), then the two
      vcpus will diverge 50us in 0.1s, which is a considerable amount.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e44e449
  3. 03 3月, 2014 1 次提交
  4. 28 2月, 2014 3 次提交
  5. 27 2月, 2014 1 次提交
  6. 26 2月, 2014 3 次提交
  7. 24 2月, 2014 1 次提交
  8. 22 2月, 2014 4 次提交
  9. 18 2月, 2014 2 次提交
  10. 14 2月, 2014 3 次提交
  11. 13 2月, 2014 1 次提交
  12. 12 2月, 2014 1 次提交
    • S
      ftrace/x86: Use breakpoints for converting function graph caller · 87fbb2ac
      Steven Rostedt (Red Hat) 提交于
      When the conversion was made to remove stop machine and use the breakpoint
      logic instead, the modification of the function graph caller is still
      done directly as though it was being done under stop machine.
      
      As it is not converted via stop machine anymore, there is a possibility
      that the code could be layed across cache lines and if another CPU is
      accessing that function graph call when it is being updated, it could
      cause a General Protection Fault.
      
      Convert the update of the function graph caller to use the breakpoint
      method as well.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87fbb2ac
  13. 11 2月, 2014 1 次提交
    • M
      xen: properly account for _PAGE_NUMA during xen pte translations · a9c8e4be
      Mel Gorman 提交于
      Steven Noonan forwarded a users report where they had a problem starting
      vsftpd on a Xen paravirtualized guest, with this in dmesg:
      
        BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
        page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
        page flags: 0x2ffc0000000014(referenced|dirty)
        addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
        CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
        Call Trace:
          dump_stack+0x45/0x56
          print_bad_pte+0x22e/0x250
          unmap_single_vma+0x583/0x890
          unmap_vmas+0x65/0x90
          exit_mmap+0xc5/0x170
          mmput+0x65/0x100
          do_exit+0x393/0x9e0
          do_group_exit+0xcc/0x140
          SyS_exit_group+0x14/0x20
          system_call_fastpath+0x1a/0x1f
        Disabling lock debugging due to kernel taint
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1
      
      The issue could not be reproduced under an HVM instance with the same
      kernel, so it appears to be exclusive to paravirtual Xen guests.  He
      bisected the problem to commit 1667918b ("mm: numa: clear numa
      hinting information on mprotect") that was also included in 3.12-stable.
      
      The problem was related to how xen translates ptes because it was not
      accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
      a pteval_present helper for use by xen so both bare metal and xen use
      the same code when checking if a PTE is present.
      
      [mgorman@suse.de: wrote changelog, proposed minor modifications]
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Tested-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NElena Ufimtseva <ufimtseva@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c8e4be
  14. 09 2月, 2014 1 次提交
  15. 07 2月, 2014 3 次提交
  16. 06 2月, 2014 2 次提交
    • M
      x86/efi: Allow mapping BGRT on x86-32 · 081cd62a
      Matt Fleming 提交于
      CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
      map (see commit 70087011 ("x86, efi: Don't map Boot Services on
      i386")), and so efi_lookup_mapped_addr() will fail to return a valid
      address. Executing the ioremap() path in efi_bgrt_init() causes the
      following warning on x86-32 because we're trying to ioremap() RAM,
      
       WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
       Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
        00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
        00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
        00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
       Call Trace:
        [<c09a5196>] dump_stack+0x41/0x52
        [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c0448ce2>] warn_slowpath_null+0x22/0x30
        [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
        [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
        [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
        [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
        [<c043bc2b>] ioremap_nocache+0x1b/0x20
        [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
        [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
        [<c0cafd82>] efi_late_init+0x8/0xa
        [<c0c9bab2>] start_kernel+0x3ae/0x3c3
        [<c0c9b53b>] ? repair_env_string+0x51/0x51
        [<c0c9b378>] i386_start_kernel+0x12e/0x131
      
      Switch to using early_memremap(), which won't trigger this warning, and
      has the added benefit of more accurately conveying what we're trying to
      do - map a chunk of memory.
      
      This patch addresses the following bug report,
      
        https://bugzilla.kernel.org/show_bug.cgi?id=67911Reported-by: NAdam Williamson <awilliam@redhat.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      081cd62a
    • I
      x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs · f8f20234
      Ingo Molnar 提交于
      It can take some time to validate the image, make sure
      {allyes|allmod}config doesn't enable it.
      
      I'd say randconfig will cover it often enough, and the failure is also
      borderline build coverage related: you cannot really make the decoder
      test fail via source level changes, only with changes in the build
      environment, so I agree with Andi that we can disable this one too.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: Paul Gortmaker paul.gortmaker@windriver.com>
      Suggested-and-acked-by: Andi Kleen andi@firstfloor.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f20234