提交 · 11ad65b79e8c27cdafe404e33938da270a55858a · openeuler / Kernel

10 6月, 2016 19 次提交

KVM: s390: enable ib only if available · 11ad65b7

由 David Hildenbrand 提交于 4月 04, 2016

Let's enable intervention bypass only if the facility is acutally
available.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

11ad65b7

KVM: s390: handle missing guest-storage-limit-suppression · efed1104

由 David Hildenbrand 提交于 4月 16, 2015

If guest-storage-limit-suppression is not available, we would for now
have a valid guest address space with size 0. So let's simply set the
origin to 0 and the limit to hamax.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

efed1104

KVM: s390: provide CMMA attributes only if available · f9cbd9b0

由 David Hildenbrand 提交于 3月 03, 2016

Let's not provide the device attribute for cmma enabling and clearing
if the hardware doesn't support it.

This also helps getting rid of the undocumented return value "-EINVAL"
in case CMMA is not available when trying to enable it.

Also properly document the meaning of -EINVAL for CMMA clearing.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f9cbd9b0

KVM: s390: enable CMMA if the interpration is available · c24cc9c8

由 David Hildenbrand 提交于 11月 24, 2015

Now that we can detect if collaborative-memory-management interpretation
is available, replace the heuristic by a real hardware detection.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c24cc9c8

KVM: s390: guestdbg: signal missing hardware support · 89b5b4de

由 David Hildenbrand 提交于 11月 24, 2015

Without guest-PER enhancement, we can't provide any debugging support.
Therefore act like kernel support is missing.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

89b5b4de

KVM: s390: handle missing 64-bit-SCAO facility · 76a6dd72

由 David Hildenbrand 提交于 11月 24, 2015

Without that facility, we may only use scaol. So fallback
to DMA allocation in that case, so we won't overwrite random memory
via the SIE.

Also disallow ESCA, so we don't have to handle that allocation case.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

76a6dd72

KVM: s390: interface to query and configure cpu subfunctions · 0a763c78

由 David Hildenbrand 提交于 5月 18, 2016

We have certain instructions that indicate available subfunctions via
a query subfunction (crypto functions and ptff), or via a test bit
function (plo).

By exposing these "subfunction blocks" to user space, we allow user space
to
1) query available subfunctions and make sure subfunctions won't get lost
   during migration - e.g. properly indicate them via a CPU model
2) change the subfunctions to be reported to the guest (even adding
   unavailable ones)

This mechanism works just like the way we indicate the stfl(e) list to
user space.

This way, user space could even emulate some subfunctions in QEMU in the
future. If this is ever applicable, we have to make sure later on, that
unsupported subfunctions result in an intercept to QEMU.

Please note that support to indicate them to the guest is still missing
and requires hardware support. Usually, the IBC takes already care of these
subfunctions for migration safety. QEMU should make sure to always set
these bits properly according to the machine generation to be emulated.

Available subfunctions are only valid in combination with STFLE bits
retrieved via KVM_S390_VM_CPU_MACHINE and enabled via
KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the
indicated subfunctions are guaranteed to be correct.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0a763c78

KVM: s390: gaccess: convert get_vcpu_asce() · bcfa01d7

由 David Hildenbrand 提交于 5月 31, 2016

Let's use our new function for preparing translation exceptions.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

bcfa01d7

KVM: s390: gaccess: convert guest_page_range() · cde0dcfb

由 David Hildenbrand 提交于 5月 31, 2016

Let's use our new function for preparing translation exceptions. As we will
need the correct ar, let's pass that to guest_page_range().

This will also make sure that the guest address is stored in the tec
for applicable excptions.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

cde0dcfb

KVM: s390: gaccess: convert guest_translate_address() · fbcb7d51

由 David Hildenbrand 提交于 5月 31, 2016

Let's use our new function for preparing translation exceptions.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

fbcb7d51

KVM: s390: gaccess: convert kvm_s390_check_low_addr_prot_real() · 3e3c67f6

由 David Hildenbrand 提交于 5月 31, 2016

Let's use our new function for preparing translation exceptions.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3e3c67f6

KVM: s390: gaccess: function for preparing translation exceptions · d03193de

由 David Hildenbrand 提交于 5月 31, 2016

Let's provide a function trans_exc() that can be used for handling
preparation of translation exceptions on a central basis. We will use
that function to replace existing code in gaccess.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d03193de

KVM: s390: gaccess: store guest address on ALC prot exceptions · 6167375b

由 David Hildenbrand 提交于 5月 31, 2016

Let's pass the effective guest address to get_vcpu_asce(), so we
can properly set the guest address in case we inject an ALC protection
exception.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

6167375b

KVM: s390: forward ESOP if available · 22be5a13

由 David Hildenbrand 提交于 1月 21, 2016

ESOP guarantees that during a protection exception, bit 61 of real location
168-175 will only be set to 1 if it was because of ALCP or DATP. If the
exception is due to LAP or KCP, the bit will always be set to 0.

The old SOP definition allowed bit 61 to be unpredictable in case of LAP
or KCP in some conditions. So ESOP replaces this unpredictability by
a guarantee.

Therefore, we can directly forward ESOP if it is available on our machine.
We don't have to do anything when ESOP is disabled - the guest will simply
expect unpredictable values. Our guest access functions are already
handling ESOP properly.

Please note that future functionality in KVM will require knowledge about
ESOP being enabled for a guest or not.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

22be5a13

KVM: s390: interface to query and configure cpu features · 15c9705f

由 David Hildenbrand 提交于 3月 19, 2015

For now, we only have an interface to query and configure facilities
indicated via STFL(E). However, we also have features indicated via
SCLP, that have to be indicated to the guest by user space and usually
require KVM support.

This patch allows user space to query and configure available cpu features
for the guest.

Please note that disabling a feature doesn't necessarily mean that it is
completely disabled (e.g. ESOP is mostly handled by the SIE). We will try
our best to disable it.

Most features (e.g. SCLP) can't directly be forwarded, as most of them need
in addition to hardware support, support in KVM. As we later on want to
turn these features in KVM explicitly on/off (to simulate different
behavior), we have to filter all features provided by the hardware and
make them configurable.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

15c9705f

KVM: s390: Add mnemonic print to kvm_s390_intercept_prog · c1778e51

由 Alexander Yarygin 提交于 5月 06, 2016

We have a table of mnemonic names for intercepted program
interruptions, let's print readable name of the interruption in the
kvm_s390_intercept_prog trace event.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c1778e51

KVM: s390: Limit sthyi execution · 7d0a5e62

由 Janosch Frank 提交于 5月 10, 2016

Store hypervisor information is a valid instruction not only in
supervisor state but also in problem state, i.e. the guest's
userspace. Its execution is not only computational and memory
intensive, but also has to get hold of the ipte lock to write to the
guest's memory.

This lock is not intended to be held often and long, especially not
from the untrusted guest userspace. Therefore we apply rate limiting
of sthyi executions per VM.
Signed-off-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

7d0a5e62

KVM: s390: Add sthyi emulation · 95ca2cb5

由 Janosch Frank 提交于 5月 23, 2016

Store Hypervisor Information is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.

The instruction is available whenever the facility bit 74 is set,
otherwise executing it results in an operation exception.

It is important to check the validity flags in the sections before
using data from any structure member. It is not guaranteed that all
members will be valid on all machines / machine configurations.
Signed-off-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

95ca2cb5

KVM: s390: Add operation exception interception handler · a011eeb2

由 Janosch Frank 提交于 5月 09, 2016

This commit introduces code that handles operation exception
interceptions. With this handler we can emulate instructions by using
illegal opcodes.
Signed-off-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a011eeb2

13 5月, 2016 1 次提交

KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2

由 Christian Borntraeger 提交于 5月 13, 2016

Some wakeups should not be considered a sucessful poll. For example on
s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
would be considered runnable - letting all vCPUs poll all the time for
transactional like workload, even if one vCPU would be enough.
This can result in huge CPU usage for large guests.
This patch lets architectures provide a way to qualify wakeups if they
should be considered a good/bad wakeups in regard to polls.

For s390 the implementation will fence of halt polling for anything but
known good, single vCPU events. The s390 implementation for floating
interrupts does a wakeup for one vCPU, but the interrupt will be delivered
by whatever CPU checks first for a pending interrupt. We prefer the
woken up CPU by marking the poll of this CPU as "good" poll.
This code will also mark several other wakeup reasons like IPI or
expired timers as "good". This will of course also mark some events as
not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
we prefer to not poll, unless we are really sure, though.

This patch successfully limits the CPU usage for cases like uperf 1byte
transactional ping pong workload or wakeup heavy workload like OLTP
while still providing a proper speedup.

This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
wakeups that are considered not good for polling.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
Cc: David Matlack <dmatlack@google.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
[Rename config symbol. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3491caf2

09 5月, 2016 6 次提交

KVM: s390: Populate mask of non-hypervisor managed facility bits · 60a37709

由 Alexander Yarygin 提交于 4月 01, 2016

When a guest is initializing, KVM provides facility bits that can be
successfully used by the guest. It's done by applying
kvm_s390_fac_list_mask mask on host facility bits stored by the STFLE
instruction. Facility bits can be one of two kinds: it's either a
hypervisor managed bit or non-hypervisor managed.

The hardware provides information which bits need special handling.
Let's automatically passthrough to guests new facility bits, that
don't require hypervisor support.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NEric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

60a37709

KVM: s390: Enable all facility bits that are known good for passthrough · ed8dda0b

由 Alexander Yarygin 提交于 3月 31, 2016

Some facility bits are in a range that is defined to be "ok for guests
without any necessary hypervisor changes". Enable those bits.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

ed8dda0b

KVM: s390: force ibc into valid range · 053dd230

由 David Hildenbrand 提交于 4月 04, 2016

Some hardware variants will round the ibc value up/down themselves,
others will report a validity intercept. Let's always round it up/down.

This patch will also make sure that the ibc is set to 0 in case we don't
have ibc support (lowest_ibc == 0).
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

053dd230

KVM: s390: cleanup cpuid handling · 9bb0ec09

由 David Hildenbrand 提交于 4月 04, 2016

We only have one cpuid for all VCPUs, so let's directly use the one in the
cpu model. Also always store it directly as u64, no need for struct cpuid.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9bb0ec09

KVM: s390: enable SRS only if enabled for the guest · bd50e8ec

由 David Hildenbrand 提交于 3月 04, 2016

If we don't have SIGP SENSE RUNNING STATUS enabled for the guest, let's
not enable interpretation so we can correctly report an invalid order.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

bd50e8ec

KVM: s390: enable PFMFI only if guest has EDAT1 · d6af0b49

由 David Hildenbrand 提交于 3月 04, 2016

Only enable PFMF interpretation if the necessary facility (EDAT1) is
available, otherwise the pfmf handler in priv.c will inject an exception
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d6af0b49

04 5月, 2016 2 次提交

KVM: s390: support NQ only if the facility is enabled for the guest · edc5b055

由 David Hildenbrand 提交于 3月 04, 2016

While we can not fully fence of the Nonquiescing Key-Setting facility,
we should as try our best to hide it.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

edc5b055

KVM: s390: cmma: don't check entry content · 4a5e7e38

由 David Hildenbrand 提交于 4月 12, 2016

We should never inject an exception after we manually rewound the PSW
(to retry the ESSA instruction in this case). This will mess up the PSW.
So this never worked and therefore never really triggered.

Looking at the details, we don't even have to perform any validity checks.
1. Bits 52-63 of an entry are stored as 0 by the hardware.
2. We are dealing with absolute addresses but only check for the prefix
   starting at address 0. This isn't correct and doesn't make much sense,
   cpus could still zap the prefix of other cpus. But as prefix pages
   cannot be swapped out without a notifier being called for the affected
   VCPU, a zap can never remove a protected prefix.
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4a5e7e38

20 4月, 2016 2 次提交

KVM: s390: add clear I/O irq operation for FLIC · 6d28f789

由 Halil Pasic 提交于 1月 25, 2016

Introduce a FLIC operation for clearing I/O interrupts for a subchannel.

Rationale: According to the platform specification, pending I/O
interruption requests have to be revoked in certain situations. For
instance, according to the Principles of Operation (page 17-27), a
subchannel put into the installed parameters initialized state is in the
same state as after an I/O system reset (just parameters possibly changed).
This implies that any I/O interrupts for that subchannel are no longer
pending (as I/O system resets clear I/O interrupts). Therefore, we need an
interface to clear pending I/O interrupts.
Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

6d28f789

KVM: s390: implement has_attr for FLIC · 4f129858

由 Halil Pasic 提交于 2月 25, 2016

HAS_ATTR is useful for determining the supported attributes; let's
implement it.
Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

4f129858

08 3月, 2016 10 次提交

s390: Fix misspellings in comments · 7eb792bf

由 Adam Buchbinder 提交于 3月 04, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7eb792bf

s390/mm: split arch/s390/mm/pgtable.c · 1e133ab2

由 Martin Schwidefsky 提交于 3月 08, 2016

The pgtable.c file is quite big, before it grows any larger split it
into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related
header definitions into the new gmap.h header and all of the pgste
helpers from pgtable.h to pgtable.c.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1e133ab2

s390/mm: uninline ptep_xxx functions from pgtable.h · ebde765c

由 Martin Schwidefsky 提交于 3月 08, 2016

The code in the various ptep_xxx functions has grown quite large,
consolidate them to four out-of-line functions:
ptep_xchg_direct to exchange a pte with another with immediate flushing
ptep_xchg_lazy to exchange a pte with another in a batched update
ptep_modify_prot_start to begin a protection flags update
ptep_modify_prot_commit to commit a protection flags update
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ebde765c

KVM: s390: allocate only one DMA page per VM · c54f0d6a

由 David Hildenbrand 提交于 12月 02, 2015

We can fit the 2k for the STFLE interpretation and the crypto
control block into one DMA page. As we now only have to allocate
one DMA page, we can clean up the code a bit.

As a nice side effect, this also fixes a problem with crycbd alignment in
case special allocation debug options are enabled, debugged by Sascha
Silbe.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c54f0d6a

KVM: s390: enable STFLE interpretation only if enabled for the guest · 80bc79dc

由 David Hildenbrand 提交于 12月 02, 2015

Not setting the facility list designation disables STFLE interpretation,
this is what we want if the guest was told to not have it.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

80bc79dc

KVM: s390: wake up when the VCPU cpu timer expires · b3c17f10

由 David Hildenbrand 提交于 2月 22, 2016

When the VCPU cpu timer expires, we have to wake up just like when the ckc
triggers. For now, setting up a cpu timer in the guest and going into
enabled wait will never lead to a wakeup. This patch fixes this problem.
Just as for the ckc, we have to take care of waking up too early. We
have to recalculate the sleep time and go back to sleep.

Please note that the timer callback calls kvm_s390_get_cpu_timer() from
interrupt context. As the timer is canceled when leaving handle_wait(),
and we don't do any VCPU cpu timer writes/updates in that function, we can
be sure that we will never try to read the VCPU cpu timer from the same cpu
that is currentyl updating the timer (deadlock).
Reported-by: NSascha Silbe <silbe@linux.vnet.ibm.com>
Tested-by: NSascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b3c17f10

KVM: s390: step the VCPU timer while in enabled wait · 5ebda316

由 David Hildenbrand 提交于 2月 22, 2016

The cpu timer is a mean to measure task execution time. We want
to account everything for a VCPU for which it is responsible. Therefore,
if the VCPU wants to sleep, it shall be accounted for it.

We can easily get this done by not disabling cpu timer accounting when
scheduled out while sleeping because of enabled wait.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

5ebda316

KVM: s390: protect VCPU cpu timer with a seqcount · 9c23a131

由 David Hildenbrand 提交于 2月 17, 2016

For now, only the owning VCPU thread (that has loaded the VCPU) can get a
consistent cpu timer value when calculating the delta. However, other
threads might also be interested in a more recent, consistent value. Of
special interest will be the timer callback of a VCPU that executes without
having the VCPU loaded and could run in parallel with the VCPU thread.

The cpu timer has a nice property: it is only updated by the owning VCPU
thread. And speaking about accounting, a consistent value can only be
calculated by looking at cputm_start and the cpu timer itself in
one shot, otherwise the result might be wrong.

As we only have one writing thread at a time (owning VCPU thread), we can
use a seqcount instead of a seqlock and retry if the VCPU refreshed its
cpu timer. This avoids any heavy locking and only introduces a counter
update/check plus a handful of smp_wmb().

The owning VCPU thread should never have to retry on reads, and also for
other threads this might be a very rare scenario.

Please note that we have to use the raw_* variants for locking the seqcount
as lockdep will produce false warnings otherwise. The rq->lock held during
vcpu_load/put is also acquired from hardirq context. Lockdep cannot know
that we avoid potential deadlocks by disabling preemption and thereby
disable concurrent write locking attempts (via vcpu_put/load).
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9c23a131

KVM: s390: step VCPU cpu timer during kvm_run ioctl · db0758b2

由 David Hildenbrand 提交于 2月 15, 2016

Architecturally we should only provide steal time if we are scheduled
away, and not if the host interprets a guest exit. We have to step
the guest CPU timer in these cases.

In the first shot, we will step the VCPU timer only during the kvm_run
ioctl. Therefore all time spent e.g. in interception handlers or on irq
delivery will be accounted for that VCPU.

We have to take care of a few special cases:
- Other VCPUs can test for pending irqs. We can only report a consistent
  value for the VCPU thread itself when adding the delta.
- We have to take care of STP sync, therefore we have to extend
  kvm_clock_sync() and disable preemption accordingly
- During any call to disable/enable/start/stop we could get premeempted
  and therefore get start/stop calls. Therefore we have to make sure we
  don't get into an inconsistent state.

Whenever a VCPU is scheduled out, sleeping, in user space or just about
to enter the SIE, the guest cpu timer isn't stepped.

Please note that all primitives are prepared to be called from both
environments (cpu timer accounting enabled or not), although not completely
used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called
while cpu timer accounting is enabled).
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

db0758b2

KVM: s390: abstract access to the VCPU cpu timer · 4287f247

由 David Hildenbrand 提交于 2月 15, 2016

We want to manually step the cpu timer in certain scenarios in the future.
Let's abstract any access to the cpu timer, so we can hide the complexity
internally.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4287f247

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功