提交 · a87036add09283e6c4f4103a15c596c67b86ab86 · openanolis / cloud-kernel

04 3月, 2016 4 次提交

KVM: i8254: don't assume layout of kvm_kpit_state · 34f3941c

由 Radim Krčmář 提交于 3月 02, 2016

channels has offset 0 and correct size now, but that can change.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

34f3941c

KVM: i8254: remove notifiers from PIT discard policy · 71474e2f

由 Radim Krčmář 提交于 3月 02, 2016

Discard policy doesn't rely on information from notifiers, so we don't
need to register notifiers unconditionally.  We kept correct counts in
case userspace switched between policies during runtime, but that can be
avoided by reseting the state.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71474e2f

KVM: i8254: remove unnecessary uses of PIT state lock · b39c90b6

由 Radim Krčmář 提交于 3月 02, 2016

- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very
  early, but initialization doesn't use kvm->arch.vpit since the last
  patch, so we can drop locking.
- kvm_free_pit is only run after there are no users of KVM and therefore
  is the sole actor.
- Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject
  is only protected at that place.
- kvm_pit_reset isn't used anywhere and its locking can be dropped if we
  hide it.

Removing useless locking allows to see what actually is being protected
by PIT state lock (values accessible from the guest).
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b39c90b6

KVM: i8254: pass struct kvm_pit instead of kvm in PIT · 09edea72

由 Radim Krčmář 提交于 3月 02, 2016

This patch passes struct kvm_pit into internal PIT functions.
Those functions used to get PIT through kvm->arch.vpit, even though most
of them never used *kvm for other purposes.  Another benefit is that we
don't need to set kvm->arch.vpit during initialization.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09edea72

03 3月, 2016 4 次提交

KVM: MMU: apply page track notifier · 13d268ca

由 Xiao Guangrong 提交于 2月 24, 2016

Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13d268ca

KVM: page track: add notifier support · 0eb05bf2

由 Xiao Guangrong 提交于 2月 24, 2016

Notifier list is introduced so that any node wants to receive the track
event can register to the list

Two APIs are introduced here:
- kvm_page_track_register_notifier(): register the notifier to receive
  track event

- kvm_page_track_unregister_notifier(): stop receiving track event by
  unregister the notifier

The callback, node->track_write() is called when a write access on the
write tracked page happens
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0eb05bf2

KVM: page track: add the framework of guest page tracking · 21ebbeda

由 Xiao Guangrong 提交于 2月 24, 2016

The array, gfn_track[mode][gfn], is introduced in memory slot for every
guest page, this is the tracking count for the gust page on different
modes. If the page is tracked then the count is increased, the page is
not tracked after the count reaches zero

We use 'unsigned short' as the tracking count which should be enough as
shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2
for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for
smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there
is enough room for other trackers

Two callbacks, kvm_page_track_create_memslot() and
kvm_page_track_free_memslot() are implemented in this patch, they are
internally used to initialize and reclaim the memory of the array

Currently, only write track mode is supported
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21ebbeda

KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed · 92f94f1e

由 Xiao Guangrong 提交于 2月 24, 2016

kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforward
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

92f94f1e

24 2月, 2016 1 次提交

KVM: x86: fix missed hardware breakpoints · 172b2386

由 Paolo Bonzini 提交于 2月 10, 2016

Sometimes when setting a breakpoint a process doesn't stop on it.
This is because the debug registers are not loaded correctly on
VCPU load.

The following simple reproducer from Oleg Nesterov tries using debug
registers in two threads.  To see the bug, run a 2-VCPU guest with
"taskset -c 0" and run "./bp 0 1" inside the guest.

    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <sys/user.h>
    #include <asm/debugreg.h>
    #include <assert.h>

    #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

    unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
    {
        unsigned long dr7;

        dr7 = ((len | type) & 0xf)
            << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
        if (enable)
            dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

        return dr7;
    }

    int write_dr(int pid, int dr, unsigned long val)
    {
        return ptrace(PTRACE_POKEUSER, pid,
                offsetof (struct user, u_debugreg[dr]),
                val);
    }

    void set_bp(pid_t pid, void *addr)
    {
        unsigned long dr7;
        assert(write_dr(pid, 0, (long)addr) == 0);
        dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
        assert(write_dr(pid, 7, dr7) == 0);
    }

    void *get_rip(int pid)
    {
        return (void*)ptrace(PTRACE_PEEKUSER, pid,
                offsetof(struct user, regs.rip), 0);
    }

    void test(int nr)
    {
        void *bp_addr = &&label + nr, *bp_hit;
        int pid;

        printf("test bp %d\n", nr);
        assert(nr < 16); // see 16 asm nops below

        pid = fork();
        if (!pid) {
            assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
            kill(getpid(), SIGSTOP);
            for (;;) {
                label: asm (
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                );
            }
        }

        assert(pid == wait(NULL));
        set_bp(pid, bp_addr);

        for (;;) {
            assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
            assert(pid == wait(NULL));

            bp_hit = get_rip(pid);
            if (bp_hit != bp_addr)
                fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                    bp_hit - &&label, nr);
        }
    }

    int main(int argc, const char *argv[])
    {
        while (--argc) {
            int nr = atoi(*++argv);
            if (!fork())
                test(nr);
        }

        while (wait(NULL) > 0)
            ;
        return 0;
    }

Cc: stable@vger.kernel.org
Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
Reported-by: NAndrey Wagin <avagin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

172b2386

17 2月, 2016 4 次提交

P
KVM: x86: pass kvm_get_time_scale arguments in hertz · 3ae13faa
由 Paolo Bonzini 提交于 2月 08, 2016
```
Prepare for improving the precision in the next patch.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
3ae13faa

KVM: x86: fix missed hardware breakpoints · 4e422bdd

由 Paolo Bonzini 提交于 2月 10, 2016

Sometimes when setting a breakpoint a process doesn't stop on it.
This is because the debug registers are not loaded correctly on
VCPU load.

The following simple reproducer from Oleg Nesterov tries using debug
registers in both the host and the guest, for example by running "./bp
0 1" on the host and "./bp 14 15" under QEMU.

    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <sys/user.h>
    #include <asm/debugreg.h>
    #include <assert.h>

    #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

    unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
    {
        unsigned long dr7;

        dr7 = ((len | type) & 0xf)
            << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
        if (enable)
            dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

        return dr7;
    }

    int write_dr(int pid, int dr, unsigned long val)
    {
        return ptrace(PTRACE_POKEUSER, pid,
                offsetof (struct user, u_debugreg[dr]),
                val);
    }

    void set_bp(pid_t pid, void *addr)
    {
        unsigned long dr7;
        assert(write_dr(pid, 0, (long)addr) == 0);
        dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
        assert(write_dr(pid, 7, dr7) == 0);
    }

    void *get_rip(int pid)
    {
        return (void*)ptrace(PTRACE_PEEKUSER, pid,
                offsetof(struct user, regs.rip), 0);
    }

    void test(int nr)
    {
        void *bp_addr = &&label + nr, *bp_hit;
        int pid;

        printf("test bp %d\n", nr);
        assert(nr < 16); // see 16 asm nops below

        pid = fork();
        if (!pid) {
            assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
            kill(getpid(), SIGSTOP);
            for (;;) {
                label: asm (
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                );
            }
        }

        assert(pid == wait(NULL));
        set_bp(pid, bp_addr);

        for (;;) {
            assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
            assert(pid == wait(NULL));

            bp_hit = get_rip(pid);
            if (bp_hit != bp_addr)
                fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                    bp_hit - &&label, nr);
        }
    }

    int main(int argc, const char *argv[])
    {
        while (--argc) {
            int nr = atoi(*++argv);
            if (!fork())
                test(nr);
        }

        while (wait(NULL) > 0)
            ;
        return 0;
    }

Cc: stable@vger.kernel.org
Suggested-by: NNadadv Amit <namit@cs.technion.ac.il>
Reported-by: NAndrey Wagin <avagin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e422bdd

KVM: x86: rewrite handling of scaled TSC for kvmclock · 78db6a50

由 Paolo Bonzini 提交于 2月 08, 2016

This is the same as before:

    kvm_scale_tsc(tgt_tsc_khz)
        = tgt_tsc_khz * ratio
        = tgt_tsc_khz * user_tsc_khz / tsc_khz   (see set_tsc_khz)
        = user_tsc_khz                           (see kvm_guest_time_update)
        = vcpu->arch.virtual_tsc_khz             (see kvm_set_tsc_khz)

However, computing it through kvm_scale_tsc will make it possible
to include the NTP correction in tgt_tsc_khz.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

78db6a50

KVM: x86: rename argument to kvm_set_tsc_khz · 4941b8cb

由 Paolo Bonzini 提交于 2月 08, 2016

This refers to the desired (scaled) frequency, which is called
user_tsc_khz in the rest of the file.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4941b8cb

09 2月, 2016 3 次提交

KVM: x86: consolidate different ways to test for in-kernel LAPIC · bce87cce

由 Paolo Bonzini 提交于 1月 08, 2016

Different pieces of code checked for vcpu->arch.apic being (non-)NULL,
or used kvm_vcpu_has_lapic (more optimized) or lapic_in_kernel.
Replace everything with lapic_in_kernel's name and kvm_vcpu_has_lapic's
implementation.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bce87cce

KVM: x86: Use vector-hashing to deliver lowest-priority interrupts · 52004014

由 Feng Wu 提交于 1月 25, 2016

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

52004014

KVM: x86: introduce do_shl32_div32 · b51012de

由 Paolo Bonzini 提交于 1月 22, 2016

This is similar to the existing div_frac function, but it returns the
remainder too.  Unlike div_frac, it can be used to implement long
division, e.g. (a << 64) / b for 32-bit a and b.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b51012de

16 1月, 2016 1 次提交

kvm: rename pfn_t to kvm_pfn_t · ba049e93

由 Dan Williams 提交于 1月 15, 2016

To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

This patch (of 18):

The core has developed a need for a "pfn_t" type [1].  Move the existing
pfn_t in KVM to kvm_pfn_t [2].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba049e93

09 1月, 2016 2 次提交

kvm/x86: Update SynIC timers on guest entry only · f3b138c5

由 Andrey Smetanin 提交于 12月 28, 2015

Consolidate updating the Hyper-V SynIC timers in a
single place: on guest entry in processing KVM_REQ_HV_STIMER
request.  This simplifies the overall logic, and makes sure
the most current state of msrs and guest clock is used for
arming the timers (to achieve that, KVM_REQ_HV_STIMER
has to be processed after KVM_REQ_CLOCK_UPDATE).
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3b138c5

KVM: move architecture-dependent requests to arch/ · 2860c4b1

由 Paolo Bonzini 提交于 1月 07, 2016

Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2860c4b1

07 1月, 2016 2 次提交

kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock · 1dab1345

由 Nicholas Krause 提交于 12月 30, 2015

This makes sure the wall clock is updated only after an odd version value
is successfully written to guest memory.
Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1dab1345

kvm: x86: only channel 0 of the i8254 is linked to the HPET · e5e57e7a

由 Paolo Bonzini 提交于 1月 07, 2016

While setting the KVM PIT counters in 'kvm_pit_load_count', if
'hpet_legacy_start' is set, the function disables the timer on
channel[0], instead of the respective index 'channel'. This is
because channels 1-3 are not linked to the HPET.  Fix the caller
to only activate the special HPET processing for channel 0.
Reported-by: NP J P <pjp@fedoraproject.org>
Fixes: 0185604cSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5e57e7a

23 12月, 2015 1 次提交

Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks" · b92c453d

由 Thomas Gleixner 提交于 12月 23, 2015

This reverts commit 677a73a9. This patch was not meant to be merged and
has issues. Revert it.
Requested-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b92c453d

22 12月, 2015 1 次提交

KVM: x86: Reload pit counters for all channels when restoring state · 0185604c

由 Andrew Honig 提交于 11月 18, 2015

Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash.  This will ensure
that 0 values are converted to 65536 as per the spec.

This is CVE-2015-7513.
Signed-off-by: NAndy Honig <ahonig@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0185604c

17 12月, 2015 1 次提交

kvm/x86: Hyper-V SynIC timers · 1f4b34f8

由 Andrey Smetanin 提交于 11月 30, 2015

Per Hyper-V specification (and as required by Hyper-V-aware guests),
SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
of MSRs, and signals expiration by delivering a special format message
to the configured SynIC message slot and triggering the corresponding
synthetic interrupt.

Note: as implemented by this patch, all periodic timers are "lazy"
(i.e. if the vCPU wasn't scheduled for more than the timer period the
timer events are lost), regardless of the corresponding configuration
MSR.  If deemed necessary, the "catch up" mode (the timer period is
shortened until the timer catches up) will be implemented later.

Changes v2:
* Use remainder to calculate periodic timer expiration time
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f4b34f8

11 12月, 2015 2 次提交

kvm: x86: move tracepoints outside extended quiescent state · 8b89fe1f

由 Paolo Bonzini 提交于 12月 10, 2015

Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b89fe1f

x86/kvm: On KVM re-enable (e.g. after suspend), update clocks · 677a73a9

由 Andy Lutomirski 提交于 12月 10, 2015

This gets rid of the "did TSC go backwards" logic and just
updates all clocks.  It should work better (no more disabling of
fast timing) and more reliably (all of the clocks are actually
updated).
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

677a73a9

26 11月, 2015 5 次提交

KVM: x86: expose MSR_TSC_AUX to userspace · 9dbe6cf9

由 Paolo Bonzini 提交于 11月 12, 2015

If we do not do this, it is not properly saved and restored across
migration.  Windows notices due to its self-protection mechanisms,
and is very upset about it (blue screen of death).

Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dbe6cf9

kvm/x86: Hyper-V kvm exit · db397571

由 Andrey Smetanin 提交于 11月 10, 2015

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db397571

kvm/x86: Hyper-V synthetic interrupt controller · 5c919412

由 Andrey Smetanin 提交于 11月 10, 2015

SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c919412

kvm/x86: per-vcpu apicv deactivation support · d62caabb

由 Andrey Smetanin 提交于 11月 10, 2015

The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d62caabb

kvm/x86: split ioapic-handled and EOI exit bitmaps · 6308630b

由 Andrey Smetanin 提交于 11月 10, 2015

The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6308630b

18 11月, 2015 4 次提交

KVM: x86: request interrupt window when IRQ chip is split · 62a193ed

由 Matt Gingell 提交于 11月 16, 2015

Before this patch, we incorrectly enter the guest without requesting an
interrupt window if the IRQ chip is split between user space and the
kernel.

Because lapic_in_kernel no longer implies the PIC is in the kernel, this
patch tests pic_in_kernel to determining whether an interrupt window
should be requested when entering the guest.

If the APIC is in the kernel and we request an interrupt window the
guest will return immediately. If the APIC is masked the guest will not
not make forward progress and unmask it, leading to a loop when KVM
reenters and requests again. This patch adds a check to ensure the APIC
is ready to accept an interrupt before requesting a window.
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NMatt Gingell <gingell@google.com>
[Use the other newly introduced functions. - Paolo]
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

62a193ed

KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space · 934bf653

由 Matt Gingell 提交于 11月 16, 2015

Set KVM_REQ_EVENT when a PIC in user space injects a local interrupt.

Currently a request is only made when neither the PIC nor the APIC is in
the kernel, which is not sufficient in the split IRQ chip case.

This addresses a problem in QEMU where interrupts are delayed until
another path invokes the event loop.
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NMatt Gingell <gingell@google.com>
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

934bf653

KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection · 782d422b

由 Matt Gingell 提交于 11月 16, 2015

This patch breaks out a new function kvm_vcpu_ready_for_interrupt_injection.
This routine encapsulates the logic required to determine whether a vcpu
is ready to accept an interrupt injection, which is now required on
multiple paths.
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NMatt Gingell <gingell@google.com>
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

782d422b

KVM: x86: fix interrupt window handling in split IRQ chip case · 127a457a

由 Matt Gingell 提交于 11月 17, 2015

This patch ensures that dm_request_for_irq_injection and
post_kvm_run_save are in sync, avoiding that an endless ping-pong
between userspace (who correctly notices that IF=0) and
the kernel (who insists that userspace handles its request
for the interrupt window).

To synchronize them, it also adds checks for kvm_arch_interrupt_allowed
and !kvm_event_needs_reinjection.  These are always needed, not
just for in-kernel LAPIC.
Signed-off-by: NMatt Gingell <gingell@google.com>
[A collage of two patches from Matt. - Paolo]
Fixes: 1c1a9ce9
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

127a457a

10 11月, 2015 5 次提交

KVM: x86: rename update_db_bp_intercept to update_bp_intercept · a96036b8

由 Paolo Bonzini 提交于 11月 10, 2015

Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a96036b8

KVM: x86: Use the correct vcpu's TSC rate to compute time scale · 27cca94e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27cca94e

KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() · 4ba76538

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ba76538

KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() · 58ea6767

由 Haozhong Zhang 提交于 10月 20, 2015

For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58ea6767

KVM: x86: Replace call-back compute_tsc_offset() with a common function · 07c1419a

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07c1419a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功