提交 · b2e19b20708edb6413dea38e6285a6e546dce06b · openeuler / raspberrypi-kernel

08 4月, 2012 40 次提交

KVM: PPC: make e500v2 kvm and e500mc cpu mutually exclusive · b2e19b20

由 Alexander Graf 提交于 2月 15, 2012

We can't run e500v2 kvm on e500mc kernels, so indicate that by
making the 2 options mutually exclusive in kconfig.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b2e19b20

KVM: PPC: rename CONFIG_KVM_E500 -> CONFIG_KVM_E500V2 · bf7ca4bd

由 Alexander Graf 提交于 2月 15, 2012

The CONFIG_KVM_E500 option really indicates that we're running on a V2 machine,
not on a machine of the generic E500 class. So indicate that properly and
change the config name accordingly.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf7ca4bd

KVM: PPC: e500mc: add load inst fixup · 1d628af7

由 Alexander Graf 提交于 2月 15, 2012

There's always a chance we're unable to read a guest instruction. The guest
could have its TLB mapped execute-, but not readable, something odd happens
and our TLB gets flushed. So it's a good idea to be prepared for that case
and have a fallback that allows us to fix things up in that case.

Add fixup code that keeps guest code from potentially crashing our host kernel.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1d628af7

KVM: PPC: e500mc: Move r1/r2 restoration very early · a2723ce7

由 Alexander Graf 提交于 2月 15, 2012

If we hit any exception whatsoever in the restore path and r1/r2 aren't the
host registers, we don't get a working oops. So it's always a good idea to
restore them as early as possible.

This time, it actually has practical reasons to do so too, since we need to
have the host page fault handler fix up our guest instruction read code. And
for that to work we need r1/r2 restored.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a2723ce7

KVM: PPC: e500mc: implicitly set MSR_GS · 79300f8c

由 Alexander Graf 提交于 2月 15, 2012

When setting MSR for an e500mc guest, we implicitly always set MSR_GS
to make sure the guest is in guest state. Since we have this implicit
rule there, we don't need to explicitly pass MSR_GS to set_msr().

Remove all explicit setters of MSR_GS.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

79300f8c

KVM: PPC: e500mc: Add doorbell emulation support · 4ab96919

由 Alexander Graf 提交于 2月 15, 2012

When one vcpu wants to kick another, it can issue a special IPI instruction
called msgsnd. This patch emulates this instruction, its clearing counterpart
and the infrastructure required to actually trigger that interrupt inside
a guest vcpu.

With this patch, SMP guests on e500mc work.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4ab96919

KVM: PPC: e500mc support · 73196cd3

由 Scott Wood 提交于 12月 20, 2011

Add processor support for e500mc, using hardware virtualization support
(GS-mode).

Current issues include:
 - No support for external proxy (coreint) interrupt mode in the guest.

Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>,
Varun Sethi <Varun.Sethi@freescale.com>, and
Liu Yu <yu.liu@freescale.com>.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

73196cd3

KVM: PPC: booke: standard PPC floating point support · 8fae845f

由 Scott Wood 提交于 12月 20, 2011

e500mc has a normal PPC FPU, rather than SPE which is found
on e500v1/v2.

Based on code from Liu Yu <yu.liu@freescale.com>.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8fae845f

KVM: PPC: booke: category E.HV (GS-mode) support · d30f6e48

由 Scott Wood 提交于 12月 20, 2011

Chips such as e500mc that implement category E.HV in Power ISA 2.06
provide hardware virtualization features, including a new MSR mode for
guest state.  The guest OS can perform many operations without trapping
into the hypervisor, including transitions to and from guest userspace.

Since we can use SRR1[GS] to reliably tell whether an exception came from
guest state, instead of messing around with IVPR, we use DO_KVM similarly
to book3s.

Current issues include:
 - Machine checks from guest state are not routed to the host handler.
 - The guest can cause a host oops by executing an emulated instruction
   in a page that lacks read permission.  Existing e500/4xx support has
   the same problem.

Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>,
Varun Sethi <Varun.Sethi@freescale.com>, and
Liu Yu <yu.liu@freescale.com>.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: remove pt_regs usage]
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d30f6e48

powerpc/booke: Provide exception macros with interrupt name · cfac5784

由 Scott Wood 提交于 12月 20, 2011

DO_KVM will need to identify the particular exception type.

There is an existing set of arbitrary numbers that Linux passes,
but it's an undocumented mess that sort of corresponds to server/classic
exception vectors but not really.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cfac5784

KVM: PPC: e500: emulate tlbilx · ab9fc405

由 Scott Wood 提交于 12月 20, 2011

tlbilx is the new, preferred invalidation instruction.  It is not
found on e500 prior to e500mc, but there should be no harm in
supporting it on all e500.

Based on code from Ashish Kalra <Ashish.Kalra@freescale.com>.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ab9fc405

KVM: PPC: e500: Track TLB1 entries with a bitmap · 4f802fe9

由 Scott Wood 提交于 12月 20, 2011

Rather than invalidate everything when a TLB1 entry needs to be
taken down, keep track of which host TLB1 entries are used for
a given guest TLB1 entry, and invalidate just those entries.

Based on code from Ashish Kalra <Ashish.Kalra@freescale.com>
and Liu Yu <yu.liu@freescale.com>.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4f802fe9

KVM: PPC: e500: refactor core-specific TLB code · 8fdd21a2

由 Scott Wood 提交于 12月 20, 2011

The PID handling is e500v1/v2-specific, and is moved to e500.c.

The MMU sregs code and kvmppc_core_vcpu_translate will be shared with
e500mc, and is moved from e500.c to e500_tlb.c.

Partially based on patches from Liu Yu <yu.liu@freescale.com>.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: fix bisectability]
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8fdd21a2

KVM: PPC: e500: clean up arch/powerpc/kvm/e500.h · 52e1718c

由 Scott Wood 提交于 12月 20, 2011

Move vcpu to the beginning of vcpu_e500 to give it appropriate
prominence, especially if more fields end up getting added to the
end of vcpu_e500 (and vcpu ends up in the middle).

Remove gratuitous "extern" and add parameter names to prototypes.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: fix bisectability]
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

52e1718c

KVM: PPC: e500: merge <asm/kvm_e500.h> into arch/powerpc/kvm/e500.h · fc6cf995

由 Scott Wood 提交于 12月 20, 2011

Keeping two separate headers for e500-specific things was a
pain, and wasn't even organized along any logical boundary.

There was TLB stuff in <asm/kvm_e500.h> despite the existence of
arch/powerpc/kvm/e500_tlb.h, and nothing in <asm/kvm_e500.h> needed
to be referenced from outside arch/powerpc/kvm.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: fix bisectability]
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fc6cf995

KVM: PPC: e500: rename e500_tlb.h to e500.h · 29a5a6f9

由 Scott Wood 提交于 12月 20, 2011

This is in preparation for merging in the contents of
arch/powerpc/include/asm/kvm_e500.h.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

29a5a6f9

KVM: PPC: booke: Move vm core init/destroy out of booke.c · fafd6832

由 Scott Wood 提交于 12月 20, 2011

e500mc will want to do lpid allocation/deallocation here.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fafd6832

KVM: PPC: booke: add booke-level vcpu load/put · 94fa9d99

由 Scott Wood 提交于 12月 20, 2011

This gives us a place to put load/put actions that correspond to
code that is booke-specific but not specific to a particular core.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

94fa9d99

KVM: PPC: factor out lpid allocator from book3s_64_mmu_hv · 043cc4d7

由 Scott Wood 提交于 12月 20, 2011

We'll use it on e500mc as well.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

043cc4d7

powerpc/e500: split CPU_FTRS_ALWAYS/CPU_FTRS_POSSIBLE · 06aae867

由 Scott Wood 提交于 12月 20, 2011

Split e500 (v1/v2) and e500mc/e5500 to allow optimization of feature
checks that differ between the two.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

06aae867

powerpc/booke: Set CPU_FTR_DEBUG_LVL_EXC on 32-bit · 52b066fa

由 Scott Wood 提交于 12月 20, 2011

Currently 32-bit only cares about this for choice of exception
vector, which is done in core-specific code.  However, KVM will
want to distinguish as well.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

52b066fa

KVM: Remove unused dirty_bitmap_head and nr_dirty_pages · 93474b25

由 Takuya Yoshikawa 提交于 3月 01, 2012

Now that we do neither double buffering nor heuristic selection of the
write protection method these are not needed anymore.

Note: some drivers have their own implementation of set_bit_le() and
making it generic needs a bit of work; so we use test_and_set_bit_le()
and will later replace it with generic set_bit_le().
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93474b25

KVM: Switch to srcu-less get_dirty_log() · 60c34612

由 Takuya Yoshikawa 提交于 3月 03, 2012

We have seen some problems of the current implementation of
get_dirty_log() which uses synchronize_srcu_expedited() for updating
dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms
order of latency when we use VGA displays.

Furthermore the recent discussion on the following thread
    "srcu: Implement call_srcu()"
    http://lkml.org/lkml/2012/1/31/211
also motivated us to implement get_dirty_log() without SRCU.

This patch achieves this goal without sacrificing the performance of
both VGA and live migration: in practice the new code is much faster
than the old one unless we have too many dirty pages.

Implementation:

The key part of the implementation is the use of xchg() operation for
clearing dirty bits atomically.  Since this allows us to update only
BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap
until every dirty bit is cleared again for the next call.

Although some people may worry about the problem of using the atomic
memory instruction many times to the concurrently accessible bitmap,
it is usually accessed with mmu_lock held and we rarely see concurrent
accesses: so what we need to care about is the pure xchg() overheads.

Another point to note is that we do not use for_each_set_bit() to check
which ones in each BITS_PER_LONG pages are actually dirty.  Instead we
simply use __ffs() in a loop.  This is much faster than repeatedly call
find_next_bit().

Performance:

The dirty-log-perf unit test showed nice improvements, some times faster
than before, except for some extreme cases; for such cases the speed of
getting dirty page information is much faster than we process it in the
userspace.

For real workloads, both VGA and live migration, we have observed pure
improvements: when the guest was reading a file during live migration,
we originally saw a few ms of latency, but with the new method the
latency was less than 200us.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

60c34612

KVM: Avoid checking huge page mappings in get_dirty_log() · 5dc99b23

由 Takuya Yoshikawa 提交于 3月 01, 2012

Dropped such mappings when we enabled dirty logging and we will never
create new ones until we stop the logging.

For this we introduce a new function which can be used to write protect
a range of PT level pages: although we do not need to care about a range
of pages at this point, the following patch will need this feature to
optimize the write protection of many pages.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5dc99b23

KVM: MMU: Split the main body of rmap_write_protect() off from others · a0ed4607

由 Takuya Yoshikawa 提交于 3月 01, 2012

We will use this in the following patch to implement another function
which needs to write protect pages using the rmap information.

Note that there is a small change in debug printing for large pages:
we do not differentiate them from others to avoid duplicating code.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a0ed4607

kvmclock: remove unneeded EXPORT macro · 24899709

由 Eric B Munson 提交于 3月 15, 2012

check_and_clear_guest_paused does not need to be exported as it isn't used
by any modules, remove the export.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

24899709

KVM: fix kvm_vcpu_kick build failure on S390 · 8c84780d

由 Marcelo Tosatti 提交于 3月 14, 2012

S390's kvm_vcpu_stat does not contain halt_wakeup member.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8c84780d

watchdog: add check for suspended vm in softlockup detector · 5d1c0f4a

由 Eric B Munson 提交于 3月 10, 2012

A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.  When the watchdog is reset successfully, clear the guest
paused flag.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5d1c0f4a

KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRL · 1c0b28c2

由 Eric B Munson 提交于 3月 10, 2012

Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1c0b28c2

kvmclock: Add functions to check if the host has stopped the vm · 3b5d56b9

由 Eric B Munson 提交于 3月 10, 2012

When a host stops or suspends a VM it will set a flag to show this.  The
watchdog will use these functions to determine if a softlockup is real, or the
result of a suspended VM.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
asm-generic changes Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3b5d56b9

x86: pvclock: Add flag to indicate that a vm was stopped by the host · eae3ee7d

由 Eric B Munson 提交于 3月 10, 2012

This flag will be used to check if the vm was stopped by the host when a soft
lockup was detected.  The host will set the flag when it stops the guest.  On
resume, the guest will check this flag if a soft lockup is detected and skip
issuing the warning.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

eae3ee7d

KVM: PPC: Rework wqp conditional code · 2246f8b5

由 Alexander Graf 提交于 3月 13, 2012

On PowerPC, we sometimes use a waitqueue per core, not per thread,
so we can't always use the vcpu internal waitqueue.

This code has been generalized by Christoffer Dall recently, but
unfortunately broke compilation for PowerPC. At the time the helper
function is defined, struct kvm_vcpu is not declared yet, so we can't
dereference it.

This patch moves all logic into the generic inline function, at which
time we have all information necessary.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2246f8b5

KVM: Factor out kvm_vcpu_kick to arch-generic code · b6d33834

由 Christoffer Dall 提交于 3月 08, 2012

The kvm_vcpu_kick function performs roughly the same funcitonality on
most all architectures, so we shouldn't have separate copies.

PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
structure and to accomodate this special need a
__KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
kvm_arch_vcpu_wq have been defined. For all other architectures this
is a generic inline that just returns &vcpu->wq;
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b6d33834

KVM: schedule debugfs statistics for removal · 66ef8931

由 Avi Kivity 提交于 4月 08, 2012

Deprecated in favour of tracepoints.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

66ef8931

KVM: SVM: count all irq windows exit · 675acb75

由 Jason Wang 提交于 3月 08, 2012

Also count the exits of fast-path.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

675acb75

KVM: set upper bounds for iobus dev to limit userspace · 786a9f88

由 Amos Kong 提交于 3月 09, 2012

kvm_io_bus devices are used for ioevent, pit, pic, ioapic,
coalesced_mmio.

Currently Qemu only emulates one PCI bus, it contains 32 slots,
one slot contains 8 functions, maximum of supported PCI devices:
 1 * 32 * 8 = 256. One virtio-blk takes one iobus device,
one virtio-net(vhost=on) takes two iobus devices.
The maximum of coalesced mmio zone is 100, each zone
has an iobus devices. So 300 io_bus devices are not enough.

Set an upper bounds for kvm_io_range to limit userspace.
1000 is a very large limit and not bloat the typical user.
Signed-off-by: NAmos Kong <akong@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

786a9f88

KVM: resize kvm_io_range array dynamically · a1300716

由 Amos Kong 提交于 3月 09, 2012

This patch makes the kvm_io_range array can be resized dynamically.
Signed-off-by: NAmos Kong <akong@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1300716

KVM: x86: expose Intel cpu new features (HLE, RTM) to guest · 83c52915

由 Liu, Jinsong 提交于 2月 28, 2012

Intel recently release 2 new features, HLE and RTM.
Refer to http://software.intel.com/file/41417.
This patch expose them to guest.
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

83c52915

L

Linux 3.4-rc2 · 00341028
由 Linus Torvalds 提交于 4月 07, 2012

00341028

Merge tag 'regmap-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · f4e52e7f

由 Linus Torvalds 提交于 4月 07, 2012

Pull two more small regmap fixes from Mark Brown:
 - Now we have users for it that aren't running Android it turns out
   that regcache_sync_region() is much more useful to drivers if it's
   exported for use by modules.  Who knew?
 - Make sure we don't divide by zero when doing debugfs dumps of
   rbtrees, not visible up until now because everything was providing at
   least some cache on startup.

* tag 'regmap-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: prevent division by zero in rbtree_show
  regmap: Export regcache_sync_region()

f4e52e7f