1. 04 3月, 2016 4 次提交
  2. 03 3月, 2016 4 次提交
  3. 24 2月, 2016 1 次提交
    • P
      KVM: x86: fix missed hardware breakpoints · 172b2386
      Paolo Bonzini 提交于
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in two threads.  To see the bug, run a 2-VCPU guest with
      "taskset -c 0" and run "./bp 0 1" inside the guest.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      172b2386
  4. 17 2月, 2016 4 次提交
    • P
      KVM: x86: pass kvm_get_time_scale arguments in hertz · 3ae13faa
      Paolo Bonzini 提交于
      Prepare for improving the precision in the next patch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3ae13faa
    • P
      KVM: x86: fix missed hardware breakpoints · 4e422bdd
      Paolo Bonzini 提交于
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in both the host and the guest, for example by running "./bp
      0 1" on the host and "./bp 14 15" under QEMU.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: NNadadv Amit <namit@cs.technion.ac.il>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4e422bdd
    • P
      KVM: x86: rewrite handling of scaled TSC for kvmclock · 78db6a50
      Paolo Bonzini 提交于
      This is the same as before:
      
          kvm_scale_tsc(tgt_tsc_khz)
              = tgt_tsc_khz * ratio
              = tgt_tsc_khz * user_tsc_khz / tsc_khz   (see set_tsc_khz)
              = user_tsc_khz                           (see kvm_guest_time_update)
              = vcpu->arch.virtual_tsc_khz             (see kvm_set_tsc_khz)
      
      However, computing it through kvm_scale_tsc will make it possible
      to include the NTP correction in tgt_tsc_khz.
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      78db6a50
    • P
      KVM: x86: rename argument to kvm_set_tsc_khz · 4941b8cb
      Paolo Bonzini 提交于
      This refers to the desired (scaled) frequency, which is called
      user_tsc_khz in the rest of the file.
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4941b8cb
  5. 09 2月, 2016 3 次提交
  6. 16 1月, 2016 1 次提交
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  7. 09 1月, 2016 2 次提交
  8. 07 1月, 2016 2 次提交
  9. 23 12月, 2015 1 次提交
  10. 22 12月, 2015 1 次提交
  11. 17 12月, 2015 1 次提交
    • A
      kvm/x86: Hyper-V SynIC timers · 1f4b34f8
      Andrey Smetanin 提交于
      Per Hyper-V specification (and as required by Hyper-V-aware guests),
      SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
      of MSRs, and signals expiration by delivering a special format message
      to the configured SynIC message slot and triggering the corresponding
      synthetic interrupt.
      
      Note: as implemented by this patch, all periodic timers are "lazy"
      (i.e. if the vCPU wasn't scheduled for more than the timer period the
      timer events are lost), regardless of the corresponding configuration
      MSR.  If deemed necessary, the "catch up" mode (the timer period is
      shortened until the timer catches up) will be implemented later.
      
      Changes v2:
      * Use remainder to calculate periodic timer expiration time
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Vitaly Kuznetsov <vkuznets@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f4b34f8
  12. 11 12月, 2015 2 次提交
  13. 26 11月, 2015 5 次提交
    • P
      KVM: x86: expose MSR_TSC_AUX to userspace · 9dbe6cf9
      Paolo Bonzini 提交于
      If we do not do this, it is not properly saved and restored across
      migration.  Windows notices due to its self-protection mechanisms,
      and is very upset about it (blue screen of death).
      
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9dbe6cf9
    • A
      kvm/x86: Hyper-V kvm exit · db397571
      Andrey Smetanin 提交于
      A new vcpu exit is introduced to notify the userspace of the
      changes in Hyper-V SynIC configuration triggered by guest writing to the
      corresponding MSRs.
      
      Changes v4:
      * exit into userspace only if guest writes into SynIC MSR's
      
      Changes v3:
      * added KVM_EXIT_HYPERV types and structs notes into docs
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db397571
    • A
      kvm/x86: Hyper-V synthetic interrupt controller · 5c919412
      Andrey Smetanin 提交于
      SynIC (synthetic interrupt controller) is a lapic extension,
      which is controlled via MSRs and maintains for each vCPU
       - 16 synthetic interrupt "lines" (SINT's); each can be configured to
         trigger a specific interrupt vector optionally with auto-EOI
         semantics
       - a message page in the guest memory with 16 256-byte per-SINT message
         slots
       - an event flag page in the guest memory with 16 2048-bit per-SINT
         event flag areas
      
      The host triggers a SINT whenever it delivers a new message to the
      corresponding slot or flips an event flag bit in the corresponding area.
      The guest informs the host that it can try delivering a message by
      explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
      MSR.
      
      The userspace (qemu) triggers interrupts and receives EOM notifications
      via irqfd with resampler; for that, a GSI is allocated for each
      configured SINT, and irq_routing api is extended to support GSI-SINT
      mapping.
      
      Changes v4:
      * added activation of SynIC by vcpu KVM_ENABLE_CAP
      * added per SynIC active flag
      * added deactivation of APICv upon SynIC activation
      
      Changes v3:
      * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
      docs
      
      Changes v2:
      * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
      * add Hyper-V SynIC vectors into EOI exit bitmap
      * Hyper-V SyniIC SINT msr write logic simplified
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c919412
    • A
      kvm/x86: per-vcpu apicv deactivation support · d62caabb
      Andrey Smetanin 提交于
      The decision on whether to use hardware APIC virtualization used to be
      taken globally, based on the availability of the feature in the CPU
      and the value of a module parameter.
      
      However, under certain circumstances we want to control it on per-vcpu
      basis.  In particular, when the userspace activates HyperV synthetic
      interrupt controller (SynIC), APICv has to be disabled as it's
      incompatible with SynIC auto-EOI behavior.
      
      To achieve that, introduce 'apicv_active' flag on struct
      kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
      off.  The flag is initialized based on the module parameter and CPU
      capability, and consulted whenever an APICv-specific action is
      performed.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d62caabb
    • A
      kvm/x86: split ioapic-handled and EOI exit bitmaps · 6308630b
      Andrey Smetanin 提交于
      The function to determine if the vector is handled by ioapic used to
      rely on the fact that only ioapic-handled vectors were set up to
      cause vmexits when virtual apic was in use.
      
      We're going to break this assumption when introducing Hyper-V
      synthetic interrupts: they may need to cause vmexits too.
      
      To achieve that, introduce a new bitmap dedicated specifically for
      ioapic-handled vectors, and populate EOI exit bitmap from it for now.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6308630b
  14. 18 11月, 2015 4 次提交
  15. 10 11月, 2015 5 次提交