提交 · 70e4da7a8ff62f2775337b705f45c804bb450454 · openanolis / cloud-kernel

26 2月, 2016 1 次提交

KVM: x86: fix root cause for missed hardware breakpoints · 70e4da7a

由 Paolo Bonzini 提交于 2月 26, 2016

Commit 172b2386 ("KVM: x86: fix missed hardware breakpoints",
2016-02-10) worked around a case where the debug registers are not loaded
correctly on preemption and on the first entry to KVM_RUN.

However, Xiao Guangrong pointed out that the root cause must be that
KVM_DEBUGREG_BP_ENABLED is not being set correctly.  This can indeed
happen due to the lazy debug exit mechanism, which does not call
kvm_update_dr7.  Fix it by replacing the existing loop (more or less
equivalent to kvm_update_dr0123) with calls to all the kvm_update_dr*
functions.

Cc: stable@vger.kernel.org   # 4.1+
Fixes: 172b2386Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

70e4da7a

24 2月, 2016 1 次提交

KVM: x86: fix missed hardware breakpoints · 172b2386

由 Paolo Bonzini 提交于 2月 10, 2016

Sometimes when setting a breakpoint a process doesn't stop on it.
This is because the debug registers are not loaded correctly on
VCPU load.

The following simple reproducer from Oleg Nesterov tries using debug
registers in two threads.  To see the bug, run a 2-VCPU guest with
"taskset -c 0" and run "./bp 0 1" inside the guest.

    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <sys/user.h>
    #include <asm/debugreg.h>
    #include <assert.h>

    #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

    unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
    {
        unsigned long dr7;

        dr7 = ((len | type) & 0xf)
            << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
        if (enable)
            dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

        return dr7;
    }

    int write_dr(int pid, int dr, unsigned long val)
    {
        return ptrace(PTRACE_POKEUSER, pid,
                offsetof (struct user, u_debugreg[dr]),
                val);
    }

    void set_bp(pid_t pid, void *addr)
    {
        unsigned long dr7;
        assert(write_dr(pid, 0, (long)addr) == 0);
        dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
        assert(write_dr(pid, 7, dr7) == 0);
    }

    void *get_rip(int pid)
    {
        return (void*)ptrace(PTRACE_PEEKUSER, pid,
                offsetof(struct user, regs.rip), 0);
    }

    void test(int nr)
    {
        void *bp_addr = &&label + nr, *bp_hit;
        int pid;

        printf("test bp %d\n", nr);
        assert(nr < 16); // see 16 asm nops below

        pid = fork();
        if (!pid) {
            assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
            kill(getpid(), SIGSTOP);
            for (;;) {
                label: asm (
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                );
            }
        }

        assert(pid == wait(NULL));
        set_bp(pid, bp_addr);

        for (;;) {
            assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
            assert(pid == wait(NULL));

            bp_hit = get_rip(pid);
            if (bp_hit != bp_addr)
                fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                    bp_hit - &&label, nr);
        }
    }

    int main(int argc, const char *argv[])
    {
        while (--argc) {
            int nr = atoi(*++argv);
            if (!fork())
                test(nr);
        }

        while (wait(NULL) > 0)
            ;
        return 0;
    }

Cc: stable@vger.kernel.org
Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
Reported-by: NAndrey Wagin <avagin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

172b2386

16 1月, 2016 1 次提交

kvm: rename pfn_t to kvm_pfn_t · ba049e93

由 Dan Williams 提交于 1月 15, 2016

To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

This patch (of 18):

The core has developed a need for a "pfn_t" type [1].  Move the existing
pfn_t in KVM to kvm_pfn_t [2].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba049e93

09 1月, 2016 2 次提交

kvm/x86: Update SynIC timers on guest entry only · f3b138c5

由 Andrey Smetanin 提交于 12月 28, 2015

Consolidate updating the Hyper-V SynIC timers in a
single place: on guest entry in processing KVM_REQ_HV_STIMER
request.  This simplifies the overall logic, and makes sure
the most current state of msrs and guest clock is used for
arming the timers (to achieve that, KVM_REQ_HV_STIMER
has to be processed after KVM_REQ_CLOCK_UPDATE).
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3b138c5

KVM: move architecture-dependent requests to arch/ · 2860c4b1

由 Paolo Bonzini 提交于 1月 07, 2016

Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2860c4b1

07 1月, 2016 2 次提交

kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock · 1dab1345

由 Nicholas Krause 提交于 12月 30, 2015

This makes sure the wall clock is updated only after an odd version value
is successfully written to guest memory.
Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1dab1345

kvm: x86: only channel 0 of the i8254 is linked to the HPET · e5e57e7a

由 Paolo Bonzini 提交于 1月 07, 2016

While setting the KVM PIT counters in 'kvm_pit_load_count', if
'hpet_legacy_start' is set, the function disables the timer on
channel[0], instead of the respective index 'channel'. This is
because channels 1-3 are not linked to the HPET.  Fix the caller
to only activate the special HPET processing for channel 0.
Reported-by: NP J P <pjp@fedoraproject.org>
Fixes: 0185604cSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5e57e7a

23 12月, 2015 1 次提交

Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks" · b92c453d

由 Thomas Gleixner 提交于 12月 23, 2015

This reverts commit 677a73a9. This patch was not meant to be merged and
has issues. Revert it.
Requested-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b92c453d

22 12月, 2015 1 次提交

KVM: x86: Reload pit counters for all channels when restoring state · 0185604c

由 Andrew Honig 提交于 11月 18, 2015

Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash.  This will ensure
that 0 values are converted to 65536 as per the spec.

This is CVE-2015-7513.
Signed-off-by: NAndy Honig <ahonig@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0185604c

17 12月, 2015 1 次提交

kvm/x86: Hyper-V SynIC timers · 1f4b34f8

由 Andrey Smetanin 提交于 11月 30, 2015

Per Hyper-V specification (and as required by Hyper-V-aware guests),
SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
of MSRs, and signals expiration by delivering a special format message
to the configured SynIC message slot and triggering the corresponding
synthetic interrupt.

Note: as implemented by this patch, all periodic timers are "lazy"
(i.e. if the vCPU wasn't scheduled for more than the timer period the
timer events are lost), regardless of the corresponding configuration
MSR.  If deemed necessary, the "catch up" mode (the timer period is
shortened until the timer catches up) will be implemented later.

Changes v2:
* Use remainder to calculate periodic timer expiration time
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f4b34f8

11 12月, 2015 2 次提交

kvm: x86: move tracepoints outside extended quiescent state · 8b89fe1f

由 Paolo Bonzini 提交于 12月 10, 2015

Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b89fe1f

x86/kvm: On KVM re-enable (e.g. after suspend), update clocks · 677a73a9

由 Andy Lutomirski 提交于 12月 10, 2015

This gets rid of the "did TSC go backwards" logic and just
updates all clocks.  It should work better (no more disabling of
fast timing) and more reliably (all of the clocks are actually
updated).
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

677a73a9

26 11月, 2015 5 次提交

KVM: x86: expose MSR_TSC_AUX to userspace · 9dbe6cf9

由 Paolo Bonzini 提交于 11月 12, 2015

If we do not do this, it is not properly saved and restored across
migration.  Windows notices due to its self-protection mechanisms,
and is very upset about it (blue screen of death).

Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dbe6cf9

kvm/x86: Hyper-V kvm exit · db397571

由 Andrey Smetanin 提交于 11月 10, 2015

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db397571

kvm/x86: Hyper-V synthetic interrupt controller · 5c919412

由 Andrey Smetanin 提交于 11月 10, 2015

SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c919412

kvm/x86: per-vcpu apicv deactivation support · d62caabb

由 Andrey Smetanin 提交于 11月 10, 2015

The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d62caabb

kvm/x86: split ioapic-handled and EOI exit bitmaps · 6308630b

由 Andrey Smetanin 提交于 11月 10, 2015

The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6308630b

18 11月, 2015 4 次提交

KVM: x86: request interrupt window when IRQ chip is split · 62a193ed

由 Matt Gingell 提交于 11月 16, 2015

Before this patch, we incorrectly enter the guest without requesting an
interrupt window if the IRQ chip is split between user space and the
kernel.

Because lapic_in_kernel no longer implies the PIC is in the kernel, this
patch tests pic_in_kernel to determining whether an interrupt window
should be requested when entering the guest.

If the APIC is in the kernel and we request an interrupt window the
guest will return immediately. If the APIC is masked the guest will not
not make forward progress and unmask it, leading to a loop when KVM
reenters and requests again. This patch adds a check to ensure the APIC
is ready to accept an interrupt before requesting a window.
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NMatt Gingell <gingell@google.com>
[Use the other newly introduced functions. - Paolo]
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

62a193ed

KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space · 934bf653

由 Matt Gingell 提交于 11月 16, 2015

Set KVM_REQ_EVENT when a PIC in user space injects a local interrupt.

Currently a request is only made when neither the PIC nor the APIC is in
the kernel, which is not sufficient in the split IRQ chip case.

This addresses a problem in QEMU where interrupts are delayed until
another path invokes the event loop.
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NMatt Gingell <gingell@google.com>
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

934bf653

KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection · 782d422b

由 Matt Gingell 提交于 11月 16, 2015

This patch breaks out a new function kvm_vcpu_ready_for_interrupt_injection.
This routine encapsulates the logic required to determine whether a vcpu
is ready to accept an interrupt injection, which is now required on
multiple paths.
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NMatt Gingell <gingell@google.com>
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

782d422b

KVM: x86: fix interrupt window handling in split IRQ chip case · 127a457a

由 Matt Gingell 提交于 11月 17, 2015

This patch ensures that dm_request_for_irq_injection and
post_kvm_run_save are in sync, avoiding that an endless ping-pong
between userspace (who correctly notices that IF=0) and
the kernel (who insists that userspace handles its request
for the interrupt window).

To synchronize them, it also adds checks for kvm_arch_interrupt_allowed
and !kvm_event_needs_reinjection.  These are always needed, not
just for in-kernel LAPIC.
Signed-off-by: NMatt Gingell <gingell@google.com>
[A collage of two patches from Matt. - Paolo]
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

127a457a

10 11月, 2015 10 次提交

KVM: x86: rename update_db_bp_intercept to update_bp_intercept · a96036b8

由 Paolo Bonzini 提交于 11月 10, 2015

Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a96036b8

KVM: x86: Use the correct vcpu's TSC rate to compute time scale · 27cca94e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27cca94e

KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() · 4ba76538

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ba76538

KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() · 58ea6767

由 Haozhong Zhang 提交于 10月 20, 2015

For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58ea6767

KVM: x86: Replace call-back compute_tsc_offset() with a common function · 07c1419a

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07c1419a

KVM: x86: Replace call-back set_tsc_khz() with a common function · 381d585c

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

381d585c

KVM: x86: Add a common TSC scaling function · 35181e86

由 Haozhong Zhang 提交于 10月 20, 2015

VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35181e86

KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch · ad721883

由 Haozhong Zhang 提交于 10月 20, 2015

This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad721883

KVM: x86: Collect information for setting TSC scaling ratio · bc9b961b

由 Haozhong Zhang 提交于 10月 20, 2015

The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc9b961b

KVM: x86: declare a few variables as __read_mostly · 893590c7

由 Paolo Bonzini 提交于 11月 06, 2015

These include module parameters and variables that are set by
kvm_x86_ops->hardware_setup.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

893590c7

04 11月, 2015 3 次提交

KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0() · 879ae188

由 Laszlo Ersek 提交于 11月 04, 2015

Commit b18d5431 ("KVM: x86: fix CR0.CD virtualization") was
technically correct, but it broke OVMF guests by slowing down various
parts of the firmware.

Commit fb279950 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the
first function modified by b18d5431, vmx_get_mt_mask(), for OVMF's
sake. This restored the speed of the OVMF code that runs before
PlatformPei (including the memory intensive LZMA decompression in SEC).

This patch extends the quirk to the second function modified by
b18d5431, kvm_set_cr0(). It eliminates the intrusive slowdown that
hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's
UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up
all APs at once for initialization, in order to count them.

We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from
the other half of the original commit b18d5431.

Fixes: b18d5431
Cc: stable@vger.kernel.org
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: NJanusz Mocek <januszmk6@gmail.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>#
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

879ae188

KVM: x86: add read_phys to x86_emulate_ops · 7a036a6f

由 Radim Krčmář 提交于 10月 30, 2015

We want to read the physical memory when emulating RSM.

X86EMUL_IO_NEEDED is returned on all errors for consistency with other
helpers.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Tested-by: NLaszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a036a6f

KVM: x86: removing unused variable · 2da29bcc

由 Saurabh Sengar 提交于 10月 30, 2015

removing unused variables, found by coccinelle
Signed-off-by: NSaurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2da29bcc

16 10月, 2015 1 次提交

KVM: x86: move steal time initialization to vcpu entry time · 7cae2bed

由 Marcelo Tosatti 提交于 10月 14, 2015

As reported at https://bugs.launchpad.net/qemu/+bug/1494350,
it is possible to have vcpu->arch.st.last_steal initialized
from a thread other than vcpu thread, say the iothread, via
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time,
before copying steal time data to guest.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cae2bed

14 10月, 2015 5 次提交

KVM: x86: fix edge EOI and IOAPIC reconfig race · db2bdcbb

由 Radim Krčmář 提交于 10月 08, 2015

KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration;  thus EOI from the pending interrupt is not
recognized.

(Reconfiguration is not a problem for level interrupts, because IOAPIC
 sends interrupt with the new configuration.)

For an edge interrupt with ACK notifiers, like i8254 timer; things can
happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
    on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
    not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: b053b2ae ("KVM: x86: Add EOI exit bitmap inference")
[Before b053b2ae, this bug happened only with APICv.]
Fixes: c7c9c56c ("x86, apicv: add virtual interrupt delivery support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db2bdcbb

KVM: x86: fix previous commit for 32-bit · 25188b99

由 Paolo Bonzini 提交于 10月 14, 2015

Unfortunately I only noticed this after pushing.

Fixes: f0d648bd
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25188b99

KVM: x86: fix SMI to halted VCPU · 73917739

由 Paolo Bonzini 提交于 10月 13, 2015

An SMI to a halted VCPU must wake it up, hence a VCPU with a pending
SMI must be considered runnable.

Fixes: 64d60670
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73917739

KVM: x86: clean up kvm_arch_vcpu_runnable · 5d9bc648

由 Paolo Bonzini 提交于 10月 13, 2015

Split the huge conditional in two functions.

Fixes: 64d60670
Cc: stable@vger.kernel.org
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5d9bc648

KVM: x86: map/unmap private slots in __x86_set_memory_region · f0d648bd

由 Paolo Bonzini 提交于 10月 12, 2015

Otherwise, two copies (one of them never populated and thus bogus)
are allocated for the regular and SMM address spaces.  This breaks
SMM with EPT but without unrestricted guest support, because the
SMM copy of the identity page map is all zeros.

By moving the allocation to the caller we also remove the last
vestiges of kernel-allocated memory regions (not accessible anymore
in userspace since commit b74a07be, "KVM: Remove kernel-allocated
memory regions", 2010-06-21); that is a nice bonus.
Reported-by: NAlexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f0d648bd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功