提交 · fda902cb8347da121025c4079b9e87748228a27e · OpenHarmony / kernel_linux

16 5月, 2014 12 次提交

KVM: s390: split SIE state guest prefix field · fda902cb

由 Michael Mueller 提交于 5月 13, 2014

This patch splits the SIE state guest prefix at offset 4
into a prefix bit field. Additionally it provides the
access functions:

 - kvm_s390_get_prefix()
 - kvm_s390_set_prefix()

to access the prefix per vcpu.
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>

fda902cb

s390/sclp: add sclp_get_ibc function · 570126d3

由 Michael Mueller 提交于 3月 15, 2014

The patch adds functionality to retrieve the IBC configuration
by means of function sclp_get_ibc().
Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>

570126d3

KVM: s390: interpretive execution of SIGP EXTERNAL CALL · 4953919f

由 David Hildenbrand 提交于 2月 21, 2014

If the sigp interpretation facility is installed, most SIGP EXTERNAL CALL
operations will be interpreted instead of intercepted. A partial execution
interception will occurr at the sending cpu only if the target cpu is in the
wait state ("W" bit in the cpuflags set). Instruction interception will only
happen in error cases (e.g. cpu addr invalid).

As a sending cpu might set the external call interrupt pending flags at the
target cpu at every point in time, we can't handle this kind of interrupt using
our kvm interrupt injection mechanism. The injection will be done automatically
by the SIE when preparing the start of the target cpu.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
CC: Thomas Huth <thuth@linux.vnet.ibm.com>
[Adopt external call injection to check for sigp interpretion]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4953919f

KVM: s390: Use intercept_insn decoder in trace event · d26b8655

由 Alexander Yarygin 提交于 1月 30, 2014

The current trace definition doesn't work very well with the perf tool.
Perf shows a "insn_to_mnemonic not found" message. Let's handle the
decoding completely in a parseable format.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d26b8655

KVM: s390: decoder of SIE intercepted instructions · 05db1f6e

由 Alexander Yarygin 提交于 1月 30, 2014

This patch adds a new decoder of SIE intercepted instructions.

The decoder implemented as a macro and potentially can be used in
both kernelspace and userspace.

Note that this simplified instruction decoder is only intended to be
used with the subset of instructions that may cause a SIE intercept.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

05db1f6e

KVM: s390: Use trace tables from sie.h. · 6de1bf88

由 Alexander Yarygin 提交于 1月 30, 2014

Use the symbolic translation tables from sie.h for decoding diag, sigp
and sie exit codes.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

6de1bf88

KVM: s390: add sie exit reasons tables · ceae283b

由 Alexander Yarygin 提交于 1月 30, 2014

This patch defines tables of reasons for exiting from SIE mode
in a new sie.h header file. Tables contain SIE intercepted codes,
intercepted instructions and program interruptions codes.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

ceae283b

KVM: s390: Improved MVPG partial execution handler · f22166dc

由 Thomas Huth 提交于 5月 07, 2014

Use the new helper function kvm_arch_fault_in_page() for faulting-in
the guest pages and only inject addressing errors when we've really
hit a bad address (and return other error codes to userspace instead).
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f22166dc

KVM: s390: Introduce helper function for faulting-in a guest page · fa576c58

由 Thomas Huth 提交于 5月 06, 2014

Rework the function kvm_arch_fault_in_sync() to become a proper helper
function for faulting-in a guest page. Now it takes the guest address as
a parameter and does not ignore the possible error code from gmap_fault()
anymore (which could cause undetected error conditions before).
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

fa576c58

KVM: s390: Avoid endless loops of specification exceptions · 684135e0

由 Thomas Huth 提交于 4月 17, 2014

If the new PSW for program interrupts is invalid, the VM ends up
in an endless loop of specification exceptions. Since there is not
much left we can do in this case, we should better drop to userspace
instead so that the crash can be reported to the user.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

684135e0

KVM: s390: Improve is_valid_psw() · a3fb577e

由 Thomas Huth 提交于 4月 17, 2014

As a program status word is also invalid (and thus generates an
specification exception) if the instruction address is not even,
we should test this in is_valid_psw(), too. This patch also exports
the function so that it becomes available for other parts of the
S390 KVM code as well.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a3fb577e

KVM: s390: correct locking for s390_enable_skey · 3a801517

由 Martin Schwidefsky 提交于 5月 16, 2014

Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3a801517

13 5月, 2014 1 次提交

KVM: x86: Fix CR3 reserved bits check in long mode · d9f89b88

由 Jan Kiszka 提交于 5月 10, 2014

Regression of 346874c9: PAE is set in long mode, but that does not mean
we have valid PDPTRs.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d9f89b88

08 5月, 2014 2 次提交

kvm: x86: emulate monitor and mwait instructions as nop · 87c00572

由 Gabriel L. Somlo 提交于 5月 07, 2014

Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.

Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.
Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87c00572

kvm/x86: implement hv EOI assist · b63cf42f

由 Michael S. Tsirkin 提交于 5月 07, 2014

It seems that it's easy to implement the EOI assist
on top of the PV EOI feature: simply convert the
page address to the format expected by PV EOI.

Notes:
-"No EOI required" is set only if interrupt injected
 is edge triggered; this is true because level interrupts are going
 through IOAPIC which disables PV EOI.
 In any case, if guest triggers EOI the bit will get cleared on exit.
-For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
 KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE
 seems sufficient
 In any case, bit is cleared on exit so worst case it's never re-enabled
-no handling of PV EOI data is performed at HV_X64_MSR_EOI write;
 HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
 replacement that lets you do EOI with an MSR and not IO.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b63cf42f

07 5月, 2014 6 次提交

KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages support · 5f7dde7b

由 Nadav Amit 提交于 5月 07, 2014

In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are
supported by the CPU. Currently the bit is considered by KVM as always
reserved.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f7dde7b

KVM: vmx: handle_dr does not handle RSP correctly · a4ab9d0c

由 Nadav Amit 提交于 5月 07, 2014

The RSP register is not automatically cached, causing mov DR instruction with
RSP to fail. Instead the regular register accessing interface should be used.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a4ab9d0c

KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr · 4291b588

由 Bandan Das 提交于 5月 06, 2014

Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4291b588

KVM: nVMX: fail on invalid vmclear/vmptrld pointer · 96ec1463

由 Bandan Das 提交于 5月 06, 2014

The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96ec1463

KVM: nVMX: additional checks on vmxon region · 3573e22c

由 Bandan Das 提交于 5月 06, 2014

Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3573e22c

KVM: nVMX: rearrange get_vmx_mem_address · 19677e32

由 Bandan Das 提交于 5月 06, 2014

Our common function for vmptr checks (in 2/4) needs to fetch
the memory address
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19677e32

06 5月, 2014 5 次提交

Merge tag 'kvm-s390-20140506' of... · 2ce316f0

由 Paolo Bonzini 提交于 5月 06, 2014

Merge tag 'kvm-s390-20140506' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Fixes an error return code for the breakpoint setup

2. External interrupt fixes
2.1. Some interrupt conditions like cpu timer or clock comparator
stay pending even after the interrupt is injected. If the external
new PSW is enabled for interrupts this will result in an endless
loop. Usually this indicates a programming error in the guest OS.
Lets detect such situations and go to userspace. We will provide
a QEMU patch that sets the guest in panicked/crashed state to avoid
wasting CPU cycles.
2.2 Resend external interrupts back to the guest if the HW could
not do it.
-

2ce316f0

KVM: s390: Fix external interrupt interception · f14d82e0

由 Thomas Huth 提交于 1月 15, 2014

The external interrupt interception can only occur in rare cases, e.g.
when the PSW of the interrupt handler has a bad value. The old handler
for this interception simply ignored these events (except for increasing
the exit_external_interrupt counter), but for proper operation we either
have to inject the interrupts manually or we should drop to userspace in
case of errors.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f14d82e0

KVM: s390: Add clock comparator and CPU timer IRQ injection · e029ae5b

由 Thomas Huth 提交于 3月 26, 2014

Add an interface to inject clock comparator and CPU timer interrupts
into the guest. This is needed for handling the external interrupt
interception.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

e029ae5b

KVM: s390: return -EFAULT if copy_from_user() fails · fcc9aec3

由 Dan Carpenter 提交于 5月 03, 2014

When copy_from_user() fails, this code returns the number of bytes
remaining instead of a negative error code.  The positive number is
returned to the user but otherwise it is harmless.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

fcc9aec3

KVM: x86: improve the usability of the 'kvm_pio' tracepoint · 1171903d

由 Ulrich Obergfell 提交于 5月 02, 2014

This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated()
and emulator_pio_out_emulated(), and it adds an argument (a pointer to
the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched
from 'pio_data' (depending on 'size'), and the value is included in the
trace record ('val'). If 'count' is greater than one, this is indicated
by the string "(...)" in the trace output.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1171903d

05 5月, 2014 1 次提交

kvm/irqchip: Speed up KVM_SET_GSI_ROUTING · 719d93cd

由 Christian Borntraeger 提交于 1月 16, 2014

When starting lots of dataplane devices the bootup takes very long on
Christian's s390 with irqfd patches. With larger setups he is even
able to trigger some timeouts in some components. Turns out that the
KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
when having multiple CPUs. This is caused by the synchronize_rcu and
the HZ=100 of s390. By changing the code to use a private srcu we can
speed things up. This patch reduces the boot time till mounting root
from 8 to 2 seconds on my s390 guest with 100 disks.

Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
uses rcu_dereference_raw rather than rcu_dereference, and write-sides
do not do rcu lockdep at all).

Note that we're hardly relying on the "sleepable" part of srcu. We just
want SRCU's faster detection of grace periods.

Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
and RR. The difference between results "before" and "after" the patch
has mean -0.2% and standard deviation 0.6%. Using a paired t-test on the
data points says that there is a 2.5% probability that the patch is the
cause of the performance difference (rather than a random fluctuation).

(Restricting the t-test to RR, which is the most likely to be affected,
changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
probability that the numbers actually say something about the patch.
The probability increases mostly because there are fewer data points).

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

719d93cd

30 4月, 2014 1 次提交

Merge tag 'kvm-s390-20140429' of... · 57b5981c

由 Paolo Bonzini 提交于 4月 30, 2014

Merge tag 'kvm-s390-20140429' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
the architecture. None of these fixes is critical for any current
Linux guests, but let's play safe.

2. Optimization for single CPU guests
We can enable the IBS facility if only one VCPU is running (!STOPPED
state). We also enable this optimization for guest > 1 VCPU as soon
as all but one VCPU is in stopped state. Thus will help guests that
have tools like cpuplugd (from s390-utils) that do dynamic offline/
online of CPUs.

3. NOTES
There is one non-s390 change in include/linux/kvm_host.h that
introduces 2 defines for VCPU requests:
define KVM_REQ_ENABLE_IBS        23
define KVM_REQ_DISABLE_IBS       24

57b5981c

29 4月, 2014 7 次提交

KVM: x86: expose invariant tsc cpuid bit (v2) · e4c9a5a1

由 Marcelo Tosatti 提交于 4月 26, 2014

Invariant TSC is a property of TSC, no additional
support code necessary.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e4c9a5a1

KVM: s390: enable IBS for single running VCPUs · 8ad35755

由 David Hildenbrand 提交于 3月 14, 2014

This patch enables the IBS facility when a single VCPU is running.
The facility is dynamically turned on/off as soon as other VCPUs
enter/leave the stopped state.

When this facility is operating, some instructions can be executed
faster for single-cpu guests.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8ad35755

KVM: s390: introduce kvm_s390_vcpu_{start,stop} · 6852d7b6

由 David Hildenbrand 提交于 3月 14, 2014

This patch introduces two new functions to set/clear the CPUSTAT_STOPPED bit and
makes use of it at all applicable places. These functions prepare the additional
execution of code when starting/stopping a vcpu.

The CPUSTAT_STOPPED bit should not be touched outside of these functions.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

6852d7b6

KVM: s390: Add low-address protection to TEST BLOCK · e45efa28

由 Thomas Huth 提交于 3月 07, 2014

TEST BLOCK is also subject to the low-address protection, so we need
to check the destination address in our handler.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

e45efa28

KVM: s390: Fixes for PFMF · fb34c603

由 Thomas Huth 提交于 9月 09, 2013

Add a check for low-address protection to the PFMF handler and
convert real-addresses to absolute if necessary, as it is defined
in the Principles of Operations specification.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

fb34c603

KVM: s390: Add a function for checking the low-address protection · f8232c8c

由 Thomas Huth 提交于 3月 03, 2014

The s390 architecture has a special protection mechanism that can
be used to prevent write access to the vital data in the low-core
memory area. This patch adds a new helper function that can be used
to check for such write accesses and in case of protection, it also
sets up the exception data accordingly.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f8232c8c

KVM: s390: Handle MVPG partial execution interception · 9a558ee3

由 Thomas Huth 提交于 2月 03, 2014

When the guest executes the MVPG instruction with DAT disabled,
and the source or destination page is not mapped in the host,
the so-called partial execution interception occurs. We need to
handle this event by setting up a mapping for the corresponding
user pages.
Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9a558ee3

28 4月, 2014 2 次提交

KVM: async_pf: change async_pf_execute() to use get_user_pages(tsk => NULL) · e9545b9f

由 Oleg Nesterov 提交于 4月 28, 2014

async_pf_execute() passes tsk == current to gup(), this is doesn't
hurt but unnecessary and misleading. "tsk" is only used to account
the number of faults and current is the random workqueue thread.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9545b9f

KVM: async_pf: kill the unnecessary use_mm/unuse_mm async_pf_execute() · d72d946d

由 Oleg Nesterov 提交于 4月 21, 2014

async_pf_execute() has no reasons to adopt apf->mm, gup(current, mm)
should work just fine even if current has another or NULL ->mm.

Recently kvm_async_page_present_sync() was added insedie the "use_mm"
section, but it seems that it doesn't need current->mm too.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d72d946d

24 4月, 2014 3 次提交

KVM: MMU: flush tlb out of mmu lock when write-protect the sptes · 198c74f4

由 Xiao Guangrong 提交于 4月 17, 2014

Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
  just change the spte from writable to readonly so that we only need to care
  the case of changing spte from present to present (changing the spte from
  present to nonpresent will flush all the TLBs immediately), in other words,
  the only case we need to care is mmu_spte_update()

- in mmu_spte_update(), we haved checked
  SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
  means it does not depend on PT_WRITABLE_MASK anymore
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

198c74f4

KVM: MMU: flush tlb if the spte can be locklessly modified · 7f31c959

由 Xiao Guangrong 提交于 4月 17, 2014

Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)

This patch is used to avoid this kind of race:

      VCPU 0                         VCPU 1
lockless wirte protection:
      set spte.w = 0
                                 lock mmu-lock

                                 write protection the spte to sync shadow page,
                                 see spte.w = 0, then without flush tlb

				 unlock mmu-lock

                                 !!! At this point, the shadow page can still be
                                     writable due to the corrupt tlb entry
     Flush all TLB
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7f31c959

KVM: MMU: lazily drop large spte · c126d94f

由 Xiao Guangrong 提交于 4月 17, 2014

Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

This version has fixed the issue reported in 6b73a960, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c126d94f

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多