1. 26 11月, 2015 4 次提交
    • A
      kvm/x86: Hyper-V synthetic interrupt controller · 5c919412
      Andrey Smetanin 提交于
      SynIC (synthetic interrupt controller) is a lapic extension,
      which is controlled via MSRs and maintains for each vCPU
       - 16 synthetic interrupt "lines" (SINT's); each can be configured to
         trigger a specific interrupt vector optionally with auto-EOI
         semantics
       - a message page in the guest memory with 16 256-byte per-SINT message
         slots
       - an event flag page in the guest memory with 16 2048-bit per-SINT
         event flag areas
      
      The host triggers a SINT whenever it delivers a new message to the
      corresponding slot or flips an event flag bit in the corresponding area.
      The guest informs the host that it can try delivering a message by
      explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
      MSR.
      
      The userspace (qemu) triggers interrupts and receives EOM notifications
      via irqfd with resampler; for that, a GSI is allocated for each
      configured SINT, and irq_routing api is extended to support GSI-SINT
      mapping.
      
      Changes v4:
      * added activation of SynIC by vcpu KVM_ENABLE_CAP
      * added per SynIC active flag
      * added deactivation of APICv upon SynIC activation
      
      Changes v3:
      * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
      docs
      
      Changes v2:
      * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
      * add Hyper-V SynIC vectors into EOI exit bitmap
      * Hyper-V SyniIC SINT msr write logic simplified
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c919412
    • A
      kvm/x86: per-vcpu apicv deactivation support · d62caabb
      Andrey Smetanin 提交于
      The decision on whether to use hardware APIC virtualization used to be
      taken globally, based on the availability of the feature in the CPU
      and the value of a module parameter.
      
      However, under certain circumstances we want to control it on per-vcpu
      basis.  In particular, when the userspace activates HyperV synthetic
      interrupt controller (SynIC), APICv has to be disabled as it's
      incompatible with SynIC auto-EOI behavior.
      
      To achieve that, introduce 'apicv_active' flag on struct
      kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
      off.  The flag is initialized based on the module parameter and CPU
      capability, and consulted whenever an APICv-specific action is
      performed.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d62caabb
    • A
      kvm/x86: split ioapic-handled and EOI exit bitmaps · 6308630b
      Andrey Smetanin 提交于
      The function to determine if the vector is handled by ioapic used to
      rely on the fact that only ioapic-handled vectors were set up to
      cause vmexits when virtual apic was in use.
      
      We're going to break this assumption when introducing Hyper-V
      synthetic interrupts: they may need to cause vmexits too.
      
      To achieve that, introduce a new bitmap dedicated specifically for
      ioapic-handled vectors, and populate EOI exit bitmap from it for now.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6308630b
    • A
      kvm/irqchip: kvm_arch_irq_routing_update renaming split · abdb080f
      Andrey Smetanin 提交于
      Actually kvm_arch_irq_routing_update() should be
      kvm_arch_post_irq_routing_update() as it's called at the end
      of irq routing update.
      
      This renaming frees kvm_arch_irq_routing_update function name.
      kvm_arch_irq_routing_update() weak function which will be used
      to update mappings for arch-specific irq routing entries
      (in particular, the upcoming Hyper-V synthetic interrupts).
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      abdb080f
  2. 25 11月, 2015 7 次提交
    • H
      KVM: nVMX: remove incorrect vpid check in nested invvpid emulation · b2467e74
      Haozhong Zhang 提交于
      This patch removes the vpid check when emulating nested invvpid
      instruction of type all-contexts invalidation. The existing code is
      incorrect because:
       (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
           Translations Based on VPID", invvpid instruction does not check
           vpid in the invvpid descriptor when its type is all-contexts
           invalidation.
       (2) According to the same document, invvpid of type all-contexts
           invalidation does not require there is an active VMCS, so/and
           get_vmcs12() in the existing code may result in a NULL-pointer
           dereference. In practice, it can crash both KVM itself and L1
           hypervisors that use invvpid (e.g. Xen).
      Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2467e74
    • M
      arm64: kvm: report original PAR_EL1 upon panic · fbb4574c
      Mark Rutland 提交于
      If we call __kvm_hyp_panic while a guest context is active, we call
      __restore_sysregs before acquiring the system register values for the
      panic, in the process throwing away the PAR_EL1 value at the point of
      the panic.
      
      This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to
      restoring host register values, enabling us to report the original
      values at the point of the panic.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      fbb4574c
    • M
      arm64: kvm: avoid %p in __kvm_hyp_panic · 1d7a4e31
      Mark Rutland 提交于
      Currently __kvm_hyp_panic uses %p for values which are not pointers,
      such as the ESR value. This can confusingly lead to "(null)" being
      printed for the value.
      
      Use %x instead, and only use %p for host pointers.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1d7a4e31
    • C
      KVM: arm/arm64: Fix preemptible timer active state crazyness · 7e16aa81
      Christoffer Dall 提交于
      We were setting the physical active state on the GIC distributor in a
      preemptible section, which could cause us to set the active state on
      different physical CPU from the one we were actually going to run on,
      hacoc ensues.
      
      Since we are no longer descheduling/scheduling soft timers in the
      flush/sync timer functions, simply moving the timer flush into a
      non-preemptible section.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      7e16aa81
    • M
      arm64: KVM: Add workaround for Cortex-A57 erratum 834220 · 498cd5c3
      Marc Zyngier 提交于
      Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults
      when a Stage 1 permission fault or device alignment fault should
      have been reported.
      
      This patch implements the workaround (which is to validate that the
      Stage-1 translation actually succeeds) by using code patching.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      498cd5c3
    • M
      arm64: KVM: Fix AArch32 to AArch64 register mapping · c0f09634
      Marc Zyngier 提交于
      When running a 32bit guest under a 64bit hypervisor, the ARMv8
      architecture defines a mapping of the 32bit registers in the 64bit
      space. This includes banked registers that are being demultiplexed
      over the 64bit ones.
      
      On exceptions caused by an operation involving a 32bit register, the
      HW exposes the register number in the ESR_EL2 register. It was so
      far understood that SW had to distinguish between AArch32 and AArch64
      accesses (based on the current AArch32 mode and register number).
      
      It turns out that I misinterpreted the ARM ARM, and the clue is in
      D1.20.1: "For some exceptions, the exception syndrome given in the
      ESR_ELx identifies one or more register numbers from the issued
      instruction that generated the exception. Where the exception is
      taken from an Exception level using AArch32 these register numbers
      give the AArch64 view of the register."
      
      Which means that the HW is already giving us the translated version,
      and that we shouldn't try to interpret it at all (for example, doing
      an MMIO operation from the IRQ mode using the LR register leads to
      very unexpected behaviours).
      
      The fix is thus not to perform a call to vcpu_reg32() at all from
      vcpu_reg(), and use whatever register number is supplied directly.
      The only case we need to find out about the mapping is when we
      actively generate a register access, which only occurs when injecting
      a fault in a guest.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      c0f09634
    • A
      ARM/arm64: KVM: test properly for a PTE's uncachedness · e6fab544
      Ard Biesheuvel 提交于
      The open coded tests for checking whether a PTE maps a page as
      uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
      which is not guaranteed to work since the type of a mapping is
      not a set of mutually exclusive bits
      
      For HYP mappings, the type is an index into the MAIR table (i.e, the
      index itself does not contain any information whatsoever about the
      type of the mapping), and for stage-2 mappings it is a bit field where
      normal memory and device types are defined as follows:
      
          #define MT_S2_NORMAL            0xf
          #define MT_S2_DEVICE_nGnRE      0x1
      
      I.e., masking *and* comparing with the latter matches on the former,
      and we have been getting lucky merely because the S2 device mappings
      also have the PTE_UXN bit set, or we would misidentify memory mappings
      as device mappings.
      
      Since the unmap_range() code path (which contains one instance of the
      flawed test) is used both for HYP mappings and stage-2 mappings, and
      considering the difference between the two, it is non-trivial to fix
      this by rewriting the tests in place, as it would involve passing
      down the type of mapping through all the functions.
      
      However, since HYP mappings and stage-2 mappings both deal with host
      physical addresses, we can simply check whether the mapping is backed
      by memory that is managed by the host kernel, and only perform the
      D-cache maintenance if this is the case.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Tested-by: NPavel Fedin <p.fedin@samsung.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e6fab544
  3. 22 11月, 2015 7 次提交
    • H
      parisc: Map kernel text and data on huge pages · 41b85a11
      Helge Deller 提交于
      Adjust the linker script and map_pages() to map kernel text and data on
      physical 1MB huge/large pages.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      41b85a11
    • H
      parisc: Add Huge Page and HUGETLBFS support · 736d2169
      Helge Deller 提交于
      This patch adds huge page support to allow userspace to allocate huge
      pages and to use hugetlbfs filesystem on 32- and 64-bit Linux kernels.
      A later patch will add kernel support to map kernel text and data on
      huge pages.
      
      The only requirement is, that the kernel needs to be compiled for a
      PA8X00 CPU (PA2.0 architecture). Older PA1.X CPUs do not support
      variable page sizes. 64bit Kernels are compiled for PA2.0 by default.
      
      Technically on parisc multiple physical huge pages may be needed to
      emulate standard 2MB huge pages.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      736d2169
    • H
      parisc: Use long branch to do_syscall_trace_exit · 337685e5
      Helge Deller 提交于
      Use the 22bit instead of the 17bit branch instruction on a 64bit kernel
      to reach the do_syscall_trace_exit function from the gateway page.
      A huge page enabled kernel may need the additional branch distance bits.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      337685e5
    • H
      parisc: Increase initial kernel mapping to 32MB on 64bit kernel · 332b42e4
      Helge Deller 提交于
      For the 64bit kernel the initially 16 MB kernel memory might become too
      small if you build a kernel with many modules built-in and with kernel
      text and data areas mapped on huge pages.
      
      This patch increases the initial mapping to 32MB for 64bit kernels and
      keeps 16MB for 32bit kernels.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      332b42e4
    • H
      parisc: Initialize the fault vector earlier in the boot process. · 4182d0cd
      Helge Deller 提交于
      A fault vector on parisc needs to be 2K aligned.  Furthermore the
      checksum of the fault vector needs to sum up to 0 which is being
      calculated and written at runtime.
      
      Up to now we aligned both PA20 and PA11 fault vectors on the same 4K
      page in order to easily write the checksum after having mapped the
      kernel read-only (by mapping this page only as read-write).
      But when we want to map the kernel text and data on huge pages this
      makes things harder.
      So, simplify it by aligning both fault vectors on 2K boundries and write
      the checksum before we map the page read-only.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      4182d0cd
    • H
      parisc: Add defines for Huge page support · 1f25ad26
      Helge Deller 提交于
      Huge pages on parisc will have the same size as one pmd table, which
      is on a 64bit kernel 2MB on a kernel with 4K kernel page sizes, and
      on a 32bit kernel 4MB when used with 4K kernel pages.
      
      Since parisc does not physically supports 2MB huge page sizes, emulate
      it with two consecutive 1MB page sizes instead. Keeping the same huge
      page size as one pmd will allow us to add transparent huge page support
      later on.
      
      Bit 21 in the pte flags was unused and will now be used to mark a page
      as huge page (_PAGE_HPAGE_BIT).
      Signed-off-by: NHelge Deller <deller@gmx.de>
      1f25ad26
    • H
      parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h · dcbf0d29
      Helge Deller 提交于
      Drop the MADV_xxK_PAGES flags, which were never used and were from a proposed
      API which was never integrated into the generic Linux kernel code.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHelge Deller <deller@gmx.de>
      dcbf0d29
  4. 20 11月, 2015 6 次提交
  5. 19 11月, 2015 7 次提交
  6. 18 11月, 2015 9 次提交