提交 · 7d43bafcff17c7fb07270999d3cf002f1ed6bd3f · openeuler / Kernel

30 11月, 2015 8 次提交

KVM: s390: Make provisions for ESCA utilization · 7d43bafc

由 Eugene (jno) Dvurechenski 提交于 4月 22, 2015

This patch updates the routines (sca_*) to provide transparent access
to and manipulation on the data for both Basic and Extended SCA in use.
The kvm.arch.sca is generalized to (void *) to handle BSCA/ESCA cases.
Also the kvm.arch.use_esca flag is provided.
The actual functionality is kept the same.
Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

7d43bafc

KVM: s390: Introduce new structures · bc784cce

由 Eugene (jno) Dvurechenski 提交于 4月 23, 2015

This patch adds new structures and updates some existing ones to
provide the base for Extended SCA functionality.

The old sca_* structures were renamed to bsca_* to keep things uniform.

The access to fields of SIGP controls were turned into bitfields instead
of hardcoded bitmasks.
Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

bc784cce

KVM: s390: Provide SCA-aware helpers for VCPU add/del · a6e2f683

由 Eugene (jno) Dvurechenski 提交于 4月 21, 2015

This patch provides SCA-aware helpers to create/delete a VCPU.
This is to prepare for upcoming introduction of Extended SCA support.
Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a6e2f683

KVM: s390: Generalize access to SIGP controls · a5bd7647

由 Eugene (jno) Dvurechenski 提交于 4月 21, 2015

This patch generalizes access to the SIGP controls, which is a part of SCA.
This is to prepare for upcoming introduction of Extended SCA support.
Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

a5bd7647

KVM: s390: Generalize access to IPTE controls · 60514510

由 Eugene (jno) Dvurechenski 提交于 4月 21, 2015

This patch generalizes access to the IPTE controls, which is a part of SCA.
This is to prepare for upcoming introduction of Extended SCA support.
Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

60514510

s390/sclp: introduce checks for ESCA and HVS · f7ba1d34

由 Eugene (jno) Dvurechenski 提交于 10月 09, 2014

Introduce sclp.has_hvs and sclp.has_esca to provide a way for kvm to check
whether the extended-SCA and the home-virtual-SCA facilities are available.
Signed-off-by: NEugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f7ba1d34

KVM: s390: rewrite vcpu_post_run and drop out early · 71f116bf

由 David Hildenbrand 提交于 10月 19, 2015

Let's rewrite this function to better reflect how we actually handle
exit_code. By dropping out early we can save a few cycles. This
especially speeds up sie exits caused by host irqs.

Also, let's move the special -EOPNOTSUPP for intercepts to
the place where it belongs and convert it to -EREMOTE.
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

71f116bf

KVM: Use common function for VCPU lookup by id · e09fefde

由 David Hildenbrand 提交于 11月 05, 2015

Let's reuse the new common function for VPCU lookup by id.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
[split out the new function into a separate patch]

e09fefde

26 11月, 2015 20 次提交

T
KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() · bb11c6c9
由 Takuya Yoshikawa 提交于 11月 26, 2015
```
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
bb11c6c9

KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() · 74c4e63a

由 Takuya Yoshikawa 提交于 11月 26, 2015

As kvm_mmu_get_page() was changed so that every parent pointer would not
get into the sp->parent_ptes chain before the entry pointed to by it was
set properly, we can use the for_each_rmap_spte macro instead of
pte_list_walk().
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

74c4e63a

KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() · 98bba238

由 Takuya Yoshikawa 提交于 11月 26, 2015

Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

In addition, the patch avoids calling mark_unsync() for other parents in
the sp->parent_ptes chain than the newly added parent_pte, because they
have been there since before the current page fault handling started.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98bba238

KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() · 47005792

由 Takuya Yoshikawa 提交于 11月 20, 2015

Make kvm_mmu_alloc_page() do just what its name tells to do, and remove
the extra allocation error check and zero-initialization of parent_ptes:
shadow page headers allocated by kmem_cache_zalloc() are always in the
per-VCPU pools.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

47005792

KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes · 77fbbbd2

由 Takuya Yoshikawa 提交于 11月 20, 2015

At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77fbbbd2

KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() · afd28fe1

由 Takuya Yoshikawa 提交于 11月 20, 2015

is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e53
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb90 ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them without clear distinction just makes the code
confusing, remove is_rmap_spte().
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

afd28fe1

KVM: x86: MMU: Make mmu_set_spte() return emulate value · 029499b4

由 Takuya Yoshikawa 提交于 11月 20, 2015

mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

029499b4

KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap · fd951457

由 Takuya Yoshikawa 提交于 11月 20, 2015

Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd951457

KVM: x86: MMU: Remove unused parameter of __direct_map() · 7ee0e5b2

由 Takuya Yoshikawa 提交于 11月 20, 2015

Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ee0e5b2

KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct · 018aabb5

由 Takuya Yoshikawa 提交于 11月 20, 2015

New struct kvm_rmap_head makes the code type-safe to some extent.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

018aabb5

KVM: powerpc: kvmppc_visible_gpa can be boolean · 378b417d

由 Yaowei Bai 提交于 11月 16, 2015

In another patch kvm_is_visible_gfn is maken return bool due to this
function only returns zero or one as its return value, let's also make
kvmppc_visible_gpa return bool to keep consistent.

No functional change.
Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

378b417d

KVM: x86: MMU: always set accessed bit in shadow PTEs · 0e3d0648

由 Paolo Bonzini 提交于 11月 13, 2015

Commit 7a1638ce ("nEPT: Redefine EPT-specific link_shadow_page()",
2013-08-05) says:

    Since nEPT doesn't support A/D bit, we should not set those bit
    when building the shadow page table.

but this is not necessary.  Even though nEPT doesn't support A/D
bits, and hence the vmcs12 EPT pointer will never enable them,
we always use them for shadow page tables if available (see
construct_eptp in vmx.c).  So we can set the A/D bits freely
in the shadow page table.

This patch hence basically reverts commit 7a1638ce.

Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e3d0648

KVM: x86: correctly print #AC in traces · aba2f06c

由 Paolo Bonzini 提交于 11月 12, 2015

Poor #AC was so unimportant until a few days ago that we were
not even tracing its name correctly.  But now it's all over
the place.

Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aba2f06c

KVM: svm: add support for RDTSCP · 46896c73

由 Paolo Bonzini 提交于 11月 12, 2015

RDTSCP was never supported for AMD CPUs, which nobody noticed because
Linux does not use it.  But exactly the fact that Linux does not
use it makes the implementation very simple; we can freely trash
MSR_TSC_AUX while running the guest.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46896c73

KVM: x86: expose MSR_TSC_AUX to userspace · 9dbe6cf9

由 Paolo Bonzini 提交于 11月 12, 2015

If we do not do this, it is not properly saved and restored across
migration.  Windows notices due to its self-protection mechanisms,
and is very upset about it (blue screen of death).

Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dbe6cf9

kvm/x86: Hyper-V kvm exit · db397571

由 Andrey Smetanin 提交于 11月 10, 2015

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db397571

kvm/x86: Hyper-V synthetic interrupt controller · 5c919412

由 Andrey Smetanin 提交于 11月 10, 2015

SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c919412

kvm/x86: per-vcpu apicv deactivation support · d62caabb

由 Andrey Smetanin 提交于 11月 10, 2015

The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d62caabb

kvm/x86: split ioapic-handled and EOI exit bitmaps · 6308630b

由 Andrey Smetanin 提交于 11月 10, 2015

The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6308630b

kvm/irqchip: kvm_arch_irq_routing_update renaming split · abdb080f

由 Andrey Smetanin 提交于 11月 10, 2015

Actually kvm_arch_irq_routing_update() should be
kvm_arch_post_irq_routing_update() as it's called at the end
of irq routing update.

This renaming frees kvm_arch_irq_routing_update function name.
kvm_arch_irq_routing_update() weak function which will be used
to update mappings for arch-specific irq routing entries
(in particular, the upcoming Hyper-V synthetic interrupts).
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

abdb080f

25 11月, 2015 7 次提交

KVM: nVMX: remove incorrect vpid check in nested invvpid emulation · b2467e74

由 Haozhong Zhang 提交于 11月 25, 2015

This patch removes the vpid check when emulating nested invvpid
instruction of type all-contexts invalidation. The existing code is
incorrect because:
 (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
     Translations Based on VPID", invvpid instruction does not check
     vpid in the invvpid descriptor when its type is all-contexts
     invalidation.
 (2) According to the same document, invvpid of type all-contexts
     invalidation does not require there is an active VMCS, so/and
     get_vmcs12() in the existing code may result in a NULL-pointer
     dereference. In practice, it can crash both KVM itself and L1
     hypervisors that use invvpid (e.g. Xen).
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2467e74

arm64: kvm: report original PAR_EL1 upon panic · fbb4574c

由 Mark Rutland 提交于 11月 16, 2015

If we call __kvm_hyp_panic while a guest context is active, we call
__restore_sysregs before acquiring the system register values for the
panic, in the process throwing away the PAR_EL1 value at the point of
the panic.

This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to
restoring host register values, enabling us to report the original
values at the point of the panic.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

fbb4574c

arm64: kvm: avoid %p in __kvm_hyp_panic · 1d7a4e31

由 Mark Rutland 提交于 11月 16, 2015

Currently __kvm_hyp_panic uses %p for values which are not pointers,
such as the ESR value. This can confusingly lead to "(null)" being
printed for the value.

Use %x instead, and only use %p for host pointers.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

1d7a4e31

KVM: arm/arm64: Fix preemptible timer active state crazyness · 7e16aa81

由 Christoffer Dall 提交于 11月 24, 2015

We were setting the physical active state on the GIC distributor in a
preemptible section, which could cause us to set the active state on
different physical CPU from the one we were actually going to run on,
hacoc ensues.

Since we are no longer descheduling/scheduling soft timers in the
flush/sync timer functions, simply moving the timer flush into a
non-preemptible section.
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

7e16aa81

arm64: KVM: Add workaround for Cortex-A57 erratum 834220 · 498cd5c3

由 Marc Zyngier 提交于 11月 16, 2015

Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults
when a Stage 1 permission fault or device alignment fault should
have been reported.

This patch implements the workaround (which is to validate that the
Stage-1 translation actually succeeds) by using code patching.

Cc: stable@vger.kernel.org
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

498cd5c3

arm64: KVM: Fix AArch32 to AArch64 register mapping · c0f09634

由 Marc Zyngier 提交于 11月 16, 2015

When running a 32bit guest under a 64bit hypervisor, the ARMv8
architecture defines a mapping of the 32bit registers in the 64bit
space. This includes banked registers that are being demultiplexed
over the 64bit ones.

On exceptions caused by an operation involving a 32bit register, the
HW exposes the register number in the ESR_EL2 register. It was so
far understood that SW had to distinguish between AArch32 and AArch64
accesses (based on the current AArch32 mode and register number).

It turns out that I misinterpreted the ARM ARM, and the clue is in
D1.20.1: "For some exceptions, the exception syndrome given in the
ESR_ELx identifies one or more register numbers from the issued
instruction that generated the exception. Where the exception is
taken from an Exception level using AArch32 these register numbers
give the AArch64 view of the register."

Which means that the HW is already giving us the translated version,
and that we shouldn't try to interpret it at all (for example, doing
an MMIO operation from the IRQ mode using the LR register leads to
very unexpected behaviours).

The fix is thus not to perform a call to vcpu_reg32() at all from
vcpu_reg(), and use whatever register number is supplied directly.
The only case we need to find out about the mapping is when we
actively generate a register access, which only occurs when injecting
a fault in a guest.

Cc: stable@vger.kernel.org
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

c0f09634

ARM/arm64: KVM: test properly for a PTE's uncachedness · e6fab544

由 Ard Biesheuvel 提交于 11月 10, 2015

The open coded tests for checking whether a PTE maps a page as
uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
which is not guaranteed to work since the type of a mapping is
not a set of mutually exclusive bits

For HYP mappings, the type is an index into the MAIR table (i.e, the
index itself does not contain any information whatsoever about the
type of the mapping), and for stage-2 mappings it is a bit field where
normal memory and device types are defined as follows:

    #define MT_S2_NORMAL            0xf
    #define MT_S2_DEVICE_nGnRE      0x1

I.e., masking *and* comparing with the latter matches on the former,
and we have been getting lucky merely because the S2 device mappings
also have the PTE_UXN bit set, or we would misidentify memory mappings
as device mappings.

Since the unmap_range() code path (which contains one instance of the
flawed test) is used both for HYP mappings and stage-2 mappings, and
considering the difference between the two, it is non-trivial to fix
this by rewriting the tests in place, as it would involve passing
down the type of mapping through all the functions.

However, since HYP mappings and stage-2 mappings both deal with host
physical addresses, we can simply check whether the mapping is backed
by memory that is managed by the host kernel, and only perform the
D-cache maintenance if this is the case.

Cc: stable@vger.kernel.org
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NPavel Fedin <p.fedin@samsung.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

e6fab544

22 11月, 2015 5 次提交

parisc: Map kernel text and data on huge pages · 41b85a11

由 Helge Deller 提交于 11月 22, 2015

Adjust the linker script and map_pages() to map kernel text and data on
physical 1MB huge/large pages.
Signed-off-by: NHelge Deller <deller@gmx.de>

41b85a11

parisc: Add Huge Page and HUGETLBFS support · 736d2169

由 Helge Deller 提交于 11月 22, 2015

This patch adds huge page support to allow userspace to allocate huge
pages and to use hugetlbfs filesystem on 32- and 64-bit Linux kernels.
A later patch will add kernel support to map kernel text and data on
huge pages.

The only requirement is, that the kernel needs to be compiled for a
PA8X00 CPU (PA2.0 architecture). Older PA1.X CPUs do not support
variable page sizes. 64bit Kernels are compiled for PA2.0 by default.

Technically on parisc multiple physical huge pages may be needed to
emulate standard 2MB huge pages.
Signed-off-by: NHelge Deller <deller@gmx.de>

736d2169

parisc: Use long branch to do_syscall_trace_exit · 337685e5

由 Helge Deller 提交于 11月 20, 2015

Use the 22bit instead of the 17bit branch instruction on a 64bit kernel
to reach the do_syscall_trace_exit function from the gateway page.
A huge page enabled kernel may need the additional branch distance bits.
Signed-off-by: NHelge Deller <deller@gmx.de>

337685e5

parisc: Increase initial kernel mapping to 32MB on 64bit kernel · 332b42e4

由 Helge Deller 提交于 11月 20, 2015

For the 64bit kernel the initially 16 MB kernel memory might become too
small if you build a kernel with many modules built-in and with kernel
text and data areas mapped on huge pages.

This patch increases the initial mapping to 32MB for 64bit kernels and
keeps 16MB for 32bit kernels.
Signed-off-by: NHelge Deller <deller@gmx.de>

332b42e4

parisc: Initialize the fault vector earlier in the boot process. · 4182d0cd

由 Helge Deller 提交于 11月 20, 2015

A fault vector on parisc needs to be 2K aligned.  Furthermore the
checksum of the fault vector needs to sum up to 0 which is being
calculated and written at runtime.

Up to now we aligned both PA20 and PA11 fault vectors on the same 4K
page in order to easily write the checksum after having mapped the
kernel read-only (by mapping this page only as read-write).
But when we want to map the kernel text and data on huge pages this
makes things harder.
So, simplify it by aligning both fault vectors on 2K boundries and write
the checksum before we map the page read-only.
Signed-off-by: NHelge Deller <deller@gmx.de>

4182d0cd

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功