提交 · 2fc616c06e125baed9e8d0cd166a4e58468939c8 · openanolis / cloud-kernel

31 1月, 2018 2 次提交

MAINTAINERS: add David as a reviewer for KVM/s390 · 2fc616c0

由 Cornelia Huck 提交于 1月 12, 2018

Acked-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NCornelia Huck <cohuck@redhat.com>

2fc616c0

Merge tag 'kvm-s390-next-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux · 92ea2b33

由 Radim Krčmář 提交于 1月 30, 2018

KVM: s390: Fixes and features for 4.16 part 2

- exitless interrupts for emulated devices (Michael Mueller)
- cleanup of cpuflag handling (David Hildenbrand)
- kvm stat counter improvements (Christian Borntraeger)
- vsie improvements (David Hildenbrand)
- mm cleanup (Janosch Frank)

92ea2b33

26 1月, 2018 12 次提交

KVM: s390: introduce the format-1 GISA · 4b9f9525

由 Michael Mueller 提交于 6月 23, 2017

The patch modifies the previously defined GISA data structure to be
able to store two GISA formats, format-0 and format-1. Additionally,
it verifies the availability of the GISA format facility and enables
the use of a format-1 GISA in the SIE control block accordingly.

A format-1 can do everything that format-0 can and we will need it
for real HW passthrough. As there are systems with only format-0
we keep both variants.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4b9f9525

s390/sclp: expose the GISA format facility · 9e73ea70

由 Michael Mueller 提交于 6月 12, 2017

The GISA format facility is required by the host to be able to process
a format-1 GISA. If not available, the used GISA format will be format-0.
All format-1 related extension will not be available in this case.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9e73ea70

KVM: s390: activate GISA for emulated interrupts · f180bfda

由 Michael Mueller 提交于 6月 23, 2017

If the AIV facility is available, a GISA will be used to manage emulated
adapter interrupts.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f180bfda

KVM: s390: make kvm_s390_get_io_int() aware of GISA · 4b35f65e

由 Michael Mueller 提交于 7月 07, 2017

The function returns a pending I/O interrupt with the highest
priority defined by its ISC.

Together with AIV activation, pending adapter interrupts are
managed by the GISA IPM. Thus kvm_s390_get_io_int() needs to
inspect the IPM as well when the interrupt with the highest
priority has to be identified.

In case classic and adapter interrupts with the same ISC are
pending, the classic interrupt will be returned first.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4b35f65e

KVM: s390: add GISA interrupts to FLIC ioctl interface · 24160af6

由 Michael Mueller 提交于 6月 14, 2017

Pending interrupts marked in the GISA IPM are required to
become part of the answer of ioctl KVM_DEV_FLIC_GET_ALL_IRQS.

The ioctl KVM_DEV_FLIC_ENQUEUE is already capable to enqueue
adapter interrupts when a GISA is present.

With ioctl KVM_DEV_FLIC_CLEAR_IRQS the GISA IPM wil be cleared
now as well.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

24160af6

KVM: s390: abstract adapter interruption word generation from ISC · 2496c8e7

由 Michael Mueller 提交于 8月 31, 2017

The function isc_to_int_word() allows the generation of interruption
words for adapter interrupts.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

2496c8e7

KVM: s390: exploit GISA and AIV for emulated interrupts · d7c5cb01

由 Michael Mueller 提交于 6月 12, 2017

The adapter interruption virtualization (AIV) facility is an
optional facility that comes with functionality expected to increase
the performance of adapter interrupt handling for both emulated and
passed-through adapter interrupts. With AIV, adapter interrupts can be
delivered to the guest without exiting SIE.

This patch provides some preparations for using AIV for emulated adapter
interrupts (including virtio) if it's available. When using AIV, the
interrupts are delivered at the so called GISA by setting the bit
corresponding to its Interruption Subclass (ISC) in the Interruption
Pending Mask (IPM) instead of inserting a node into the floating interrupt
list.

To keep the change reasonably small, the handling of this new state is
deferred in get_all_floating_irqs and handle_tpi. This patch concentrates
on the code handling enqueuement of emulated adapter interrupts, and their
delivery to the guest.

Note that care is still required for adapter interrupts using AIV,
because there is no guarantee that AIV is going to deliver the adapter
interrupts pending at the GISA (consider all vcpus idle). When delivering
GISA adapter interrupts by the host (usual mechanism) special attention
is required to honor interrupt priorities.

Empirical results show that the time window between making an interrupt
pending at the GISA and doing kvm_s390_deliver_pending_interrupts is
sufficient for a guest with at least moderate cpu activity to get adapter
interrupts delivered within the SIE, and potentially save some SIE exits
(if not other deliverable interrupts).

The code will be activated with a follow-up patch.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d7c5cb01

s390/css: indicate the availability of the AIV facility · 72b523a3

由 Michael Mueller 提交于 7月 10, 2017

The patch adds an indication for the presence Adapter Interruption
Virtualization facility (AIV) of the general channel subsystem
characteristics.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
[change wording]

72b523a3

KVM: s390: implement GISA IPM related primitives · d77e6414

由 Michael Mueller 提交于 6月 12, 2017

The patch implements routines to access the GISA to test and modify
its Interruption Pending Mask (IPM) from the host side.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d77e6414

s390/bitops: add test_and_clear_bit_inv() · f3ec471a

由 Jens Freimann 提交于 3月 16, 2017

This patch adds a MSB0 bit numbering version of test_and_clear_bit().
Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f3ec471a

KVM: s390: define GISA format-0 data structure · 19114beb

由 Michael Mueller 提交于 5月 30, 2017

In preperation to support pass-through adapter interrupts, the Guest
Interruption State Area (GISA) and the Adapter Interruption Virtualization
(AIV) features will be introduced here.

This patch introduces format-0 GISA (that is defines the struct describing
the GISA, allocates storage for it, and introduces fields for the
GISA address in kvm_s390_sie_block and kvm_s390_vsie).

As the GISA requires storage below 2GB, it is put in sie_page2, which is
already allocated in ZONE_DMA. In addition, The GISA requires alignment to
its integral boundary. This is already naturally aligned via the
padding in the sie_page2.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

19114beb

KVM: s390: reverse bit ordering of irqs in pending mask · c7901a6e

由 Michael Mueller 提交于 6月 29, 2017

This patch prepares a simplification of bit operations between the irq
pending mask for emulated interrupts and the Interruption Pending Mask
(IPM) which is part of the Guest Interruption State Area (GISA), a feature
that allows interrupt delivery to guests by means of the SIE instruction.

Without that change, a bit-wise *or* operation on parts of these two masks
would either require a look-up table of size 256 bytes to map the IPM
to the emulated irq pending mask bit orientation (all bits mirrored at half
byte) or a sequence of up to 8 condidional branches to perform tests of
single bit positions. Both options are to be rejected either by performance
or space utilization reasons.

Beyond that this change will be transparent.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c7901a6e

25 1月, 2018 4 次提交

KVM: s390: introduce and use kvm_s390_test_cpuflags() · 8d5fb0dc

由 David Hildenbrand 提交于 1月 23, 2018

Use it just like kvm_s390_set_cpuflags() and kvm_s390_clear_cpuflags().
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180123170531.13687-5-david@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8d5fb0dc

KVM: s390: introduce and use kvm_s390_clear_cpuflags() · 9daecfc6

由 David Hildenbrand 提交于 1月 23, 2018

Use it just like kvm_s390_set_cpuflags().
Suggested-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180123170531.13687-4-david@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9daecfc6

KVM: s390: reuse kvm_s390_set_cpuflags() · ef8f4f49

由 David Hildenbrand 提交于 1月 23, 2018

Use it in all places where we set cpuflags.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180123170531.13687-3-david@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

ef8f4f49

KVM: s390: rename __set_cpuflag() to kvm_s390_set_cpuflags() · 2018224d

由 David Hildenbrand 提交于 1月 23, 2018

No need to make this function special. Move it to a header right away.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180123170531.13687-2-david@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

2018224d

24 1月, 2018 5 次提交

KVM: s390: add vcpu stat counters for many instruction · a37cb07a

由 Christian Borntraeger 提交于 1月 23, 2018

The overall instruction counter is larger than the sum of the
single counters. We should try to catch all instruction handlers
to make this match the summary counter.
Let us add sck,tb,sske,iske,rrbe,tb,tpi,tsch,lpsw,pswe....
and remove other unused ones.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>

a37cb07a

KVM: s390: diagnoses are instructions as well · 866c138c

由 Christian Borntraeger 提交于 1月 24, 2018

Make the diagnose counters also appear as instruction counters.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>

866c138c

s390x/mm: simplify gmap_protect_rmap() · 5c528db0

由 David Hildenbrand 提交于 1月 23, 2018

We never call it with anything but PROT_READ. This is a left over from
an old prototype. For creation of shadow page tables, we always only
have to protect the original table in guest memory from write accesses,
so we can properly invalidate the shadow on writes. Other protections
are not needed.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180123212618.32611-1-david@redhat.com>
Reviewed-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

5c528db0

KVM: s390: vsie: store guest addresses of satellite blocks in vsie_page · 15e5020e

由 David Hildenbrand 提交于 1月 16, 2018

This way, the values cannot change, even if another VCPU might try to
mess with the nested SCB currently getting executed by another VCPU.

We now always use the same gpa for pinning and unpinning a page (for
unpinning, it is only relevant to mark the guest page dirty for
migration).
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180116171526.12343-3-david@redhat.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

15e5020e

KVM: s390: vsie: use READ_ONCE to access some SCB fields · b3ecd4aa

由 David Hildenbrand 提交于 1月 16, 2018

Another VCPU might try to modify the SCB while we are creating the
shadow SCB. In general this is no problem - unless the compiler decides
to not load values once, but e.g. twice.

For us, this is only relevant when checking/working with such values.
E.g. the prefix value, the mso, state of transactional execution and
addresses of satellite blocks.

E.g. if we blindly forward values (e.g. general purpose registers or
execution controls after masking), we don't care.

Leaving unpin_blocks() untouched for now, will handle it separately.

The worst thing right now that I can see would be a missed prefix
un/remap (mso, prefix, tx) or using wrong guest addresses. Nothing
critical, but let's try to avoid unpredictable behavior.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Message-Id: <20180116171526.12343-2-david@redhat.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b3ecd4aa

23 1月, 2018 1 次提交

s390/mm: Remove superfluous parameter · c0b4bd21

由 Janosch Frank 提交于 12月 13, 2017

It seems it hasn't even been used before the last cleanup and was
overlooked.
Signed-off-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Message-Id: <1513169613-13509-12-git-send-email-frankja@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c0b4bd21

16 1月, 2018 16 次提交

KVM: VMX: introduce X2APIC_MSR macro · d7231e75

由 Paolo Bonzini 提交于 12月 21, 2017

Remove duplicate expression in nested_vmx_prepare_msr_bitmap, and make
the register names clearer in hardware_setup.
Suggested-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Resolved rebase conflict after removing Intel PT. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d7231e75

KVM: vmx: speed up MSR bitmap merge · c992384b

由 Paolo Bonzini 提交于 12月 13, 2017

The bulk of the MSR bitmap is either immutable, or can be copied from
the L1 bitmap.  By initializing it at VMXON time, and copying the mutable
parts one long at a time on vmentry (rather than one bit), about 4000
clock cycles (30%) can be saved on a nested VMLAUNCH/VMRESUME.

The resulting for loop only has four iterations, so it is cheap enough
to reinitialize the MSR write bitmaps on every iteration, and it makes
the code simpler.
Suggested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c992384b

KVM: vmx: simplify MSR bitmap setup · 1f6e5b25

由 Paolo Bonzini 提交于 12月 20, 2017

The APICv-enabled MSR bitmap is a superset of the APICv-disabled bitmap.
Make that obvious in vmx_disable_intercept_msr_x2apic.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Resolved rebase conflict after removing Intel PT. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1f6e5b25

KVM: nVMX: remove unnecessary vmwrite from L2->L1 vmexit · 07f36616

由 Paolo Bonzini 提交于 1月 01, 2018

The POSTED_INTR_NV field is constant (though it differs between the vmcs01 and
vmcs02), there is no need to reload it on vmexit to L1.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

07f36616

KVM: nVMX: initialize more non-shadowed fields in prepare_vmcs02_full · 25a2e4fe

由 Paolo Bonzini 提交于 12月 20, 2017

These fields are also simple copies of the data in the vmcs12 struct.
For some of them, prepare_vmcs02 was skipping the copy when the field
was unused.  In prepare_vmcs02_full, we copy them always as long as the
field exists on the host, because the corresponding execution control
might be one of the shadowed fields.

Optimization opportunities remain for MSRs that, depending on the
entry/exit controls, have to be copied from either the vmcs01 or
the vmcs12: EFER (whose value is partly stored in the entry controls
too), PAT, DEBUGCTL (and also DR7).  Before moving these three and
the entry/exit controls to prepare_vmcs02_full, KVM would have to set
dirty_vmcs12 on writes to the L1 MSRs.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

25a2e4fe

KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full · 8665c3f9

由 Paolo Bonzini 提交于 12月 20, 2017

This part is separate for ease of review, because git prefers to move
prepare_vmcs02 below the initial long sequence of vmcs_write* operations.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

8665c3f9

KVM: nVMX: track dirty state of non-shadowed VMCS fields · 74a497fa

由 Paolo Bonzini 提交于 12月 20, 2017

VMCS12 fields that are not handled through shadow VMCS are rarely
written, and thus they are also almost constant in the vmcs02.  We can
thus optimize prepare_vmcs02 by skipping all the work for non-shadowed
fields in the common case.

This patch introduces the (pretty simple) tracking infrastructure; the
next patches will move work to prepare_vmcs02_full and save a few hundred
clock cycles per VMRESUME on a Haswell Xeon E5 system:

	                                before  after
	cpuid                           14159   13869
	vmcall                          15290   14951
	inl_from_kernel                 17703   17447
	outl_to_kernel                  16011   14692
	self_ipi_sti_nop                16763   15825
	self_ipi_tpr_sti_nop            17341   15935
	wr_tsc_adjust_msr               14510   14264
	rd_tsc_adjust_msr               15018   14311
	mmio-wildcard-eventfd:pci-mem   16381   14947
	mmio-datamatch-eventfd:pci-mem  18620   17858
	portio-wildcard-eventfd:pci-io  15121   14769
	portio-datamatch-eventfd:pci-io 15761   14831

(average savings 748, stdev 460).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

74a497fa

KVM: VMX: split list of shadowed VMCS field to a separate file · c9e9deae

由 Paolo Bonzini 提交于 12月 20, 2017

Prepare for multiple inclusions of the list.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c9e9deae

kvm: vmx: Reduce size of vmcs_field_to_offset_table · 58e9ffae

由 Jim Mattson 提交于 12月 22, 2017

The vmcs_field_to_offset_table was a rather sparse table of short
integers with a maximum index of 0x6c16, amounting to 55342 bytes. Now
that we are considering support for multiple VMCS12 formats, it would
be unfortunate to replicate that large, sparse table. Rotating the
field encoding (as a 16-bit integer) left by 6 reduces that table to
5926 bytes.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

58e9ffae

kvm: vmx: Change vmcs_field_type to vmcs_field_width · d37f4267

由 Jim Mattson 提交于 12月 22, 2017

Per the SDM, "[VMCS] Fields are grouped by width (16-bit, 32-bit,
etc.) and type (guest-state, host-state, etc.)." Previously, the width
was indicated by vmcs_field_type. To avoid confusion when we start
dealing with both field width and field type, change vmcs_field_type
to vmcs_field_width.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d37f4267

kvm: vmx: Introduce VMCS12_MAX_FIELD_INDEX · 5b15706d

由 Jim Mattson 提交于 12月 22, 2017

This is the highest index value used in any supported VMCS12 field
encoding. It is used to populate the IA32_VMX_VMCS_ENUM MSR.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

5b15706d

KVM: VMX: optimize shadow VMCS copying · 44900ba6

由 Paolo Bonzini 提交于 12月 13, 2017

Because all fields can be read/written with a single vmread/vmwrite on
64-bit kernels, the switch statements in copy_vmcs12_to_shadow and
copy_shadow_to_vmcs12 are unnecessary.

What I did in this patch is to copy the two parts of 64-bit fields
separately on 32-bit kernels, to keep all complicated #ifdef-ery
in init_vmcs_shadow_fields.  The disadvantage is that 64-bit fields
have to be listed separately in shadow_read_only/read_write_fields,
but those are few and we can validate the arrays when building the
VMREAD and VMWRITE bitmaps.  This saves a few hundred clock cycles
per nested vmexit.

However there is still a "switch" in vmcs_read_any and vmcs_write_any.
So, while at it, this patch reorders the fields by type, hoping that
the branch predictor appreciates it.

Cc: Jim Mattson <jmattson@google.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

44900ba6

KVM: vmx: shadow more fields that are read/written on every vmexits · c5d167b2

由 Paolo Bonzini 提交于 12月 13, 2017

Compared to when VMCS shadowing was added to KVM, we are reading/writing
a few more fields: the PML index, the interrupt status and the preemption
timer value.  The first two are because we are exposing more features
to nested guests, the preemption timer is simply because we have grown
a new optimization.  Adding them to the shadow VMCS field lists reduces
the cost of a vmexit by about 1000 clock cycles for each field that exists
on bare metal.

On the other hand, the guest BNDCFGS and TSC offset are not written on
fast paths, so remove them.
Suggested-by: NJim Mattson <jmattson@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c5d167b2

R
Merge tag 'kvm-s390-next-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux · 7cd91804
由 Radim Krčmář 提交于 1月 16, 2018
```
KVM: s390: Fixes and features for 4.16

- add the virtio-ccw transport for kvmconfig
- more debug tracing for cpu model
- cleanups and fixes
```
7cd91804

KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 · 6b697711

由 Liran Alon 提交于 11月 09, 2017

Consider the following scenario:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. CPU B is currently executing L2 guest.
3. vmx_deliver_nested_posted_interrupt() calls
kvm_vcpu_trigger_posted_interrupt() which will note that
vcpu->mode == IN_GUEST_MODE.
4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
IPI, CPU B exits from L2 to L0 during event-delivery
(valid IDT-vectoring-info).
5. CPU A now sends the physical IPI. The IPI is received in host and
it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
CPU B continues to run in L0 and reach vcpu_enter_guest(). As
KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
L2 guest.
7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
consumed at next L2 entry!

Another scenario to consider:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. Assume that before CPU A calls kvm_vcpu_trigger_posted_interrupt(),
CPU B is at L0 and is about to resume into L2. Further assume that it is
in vcpu_enter_guest() after check for KVM_REQ_EVENT.
3. At this point, CPU A calls kvm_vcpu_trigger_posted_interrupt() which
will note that vcpu->mode != IN_GUEST_MODE. Therefore, do nothing and
return false. Then, will set pi_pending=true and KVM_REQ_EVENT.
4. Now CPU B continue and resumes into L2 guest without processing
the posted-interrupt until next L2 entry!

To fix both issues, we just need to change
vmx_deliver_nested_posted_interrupt() to set pi_pending=true and
KVM_REQ_EVENT before calling kvm_vcpu_trigger_posted_interrupt().

It will fix the first scenario by chaging step (6) to note that
KVM_REQ_EVENT and pi_pending=true and therefore process
nested posted-interrupt.

It will fix the second scenario by two possible ways:
1. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B has changed
vcpu->mode to IN_GUEST_MODE, physical IPI will be sent and will be received
when CPU resumes into L2.
2. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B hasn't yet
changed vcpu->mode to IN_GUEST_MODE, then after CPU B will change
vcpu->mode it will call kvm_request_pending() which will return true and
therefore force another round of vcpu_enter_guest() which will note that
KVM_REQ_EVENT and pi_pending=true and therefore process nested
posted-interrupt.

Cc: stable@vger.kernel.org
Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing")
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
[Add kvm_vcpu_kick to also handle the case where L1 doesn't intercept L2 HLT
 and L2 executes HLT instruction. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6b697711

KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts · 851c1a18

由 Liran Alon 提交于 12月 24, 2017

Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
which calls vmx_hwapic_irr_update() to update RVI.
Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
when CPU is running L2 and L1 don't intercept external-interrupts.
In that case, code injects interrupt directly into L2 instead of
updating RVI.

Besides being hacky (wouldn't expect function updating RVI to also
inject interrupt), it also doesn't handle this case correctly.
The code contains several issues:
1. When code calls kvm_queue_interrupt() it just passes it max_irr which
represents the highest IRR currently pending in L1 LAPIC.
This is problematic as interrupt was injected to guest but it's bit is
still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
max_irr priority. It just checks if interrupts are enabled in guest with
vmx_interrupt_allowed().

To fix the above issues:
1. Simplify vmx_hwapic_irr_update() to just update RVI.
Note that this shouldn't happen when CPU is running L2
(See comment in code).
2. Since now vmx_hwapic_irr_update() only does logic for L1
virtual-interrupt-delivery, inject_pending_event() should be the
one responsible for injecting the interrupt directly into L2.
Therefore, change kvm_cpu_has_injectable_intr() to check L1
LAPIC when CPU is running L2.
3. Change vmx_sync_pir_to_irr() to set KVM_REQ_EVENT when L1
has a pending injectable interrupt.

Fixes: 963fee16 ("KVM: nVMX: Fix virtual interrupt delivery
injection")
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

851c1a18

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功