- 13 5月, 2016 1 次提交
-
-
由 Christian Borntraeger 提交于
Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 3月, 2016 1 次提交
-
-
由 Adam Buchbinder 提交于
Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 12月, 2015 1 次提交
-
-
由 Anton Blanchard 提交于
The enable_kernel_*() functions leave the relevant MSR bits enabled until we exit the kernel sometime later. Create disable versions that wrap the kernel use of FP, Altivec VSX or SPE. While we don't want to disable it normally for performance reasons (MSR writes are slow), it will be used for a debug boot option that does this and catches bad uses in other areas of the kernel. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 9月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 8月, 2015 1 次提交
-
-
由 Tudor Laurentiu 提交于
On this switch branch the regs initialization doesn't happen so add it. This was found with the help of a static code analysis tool. Signed-off-by: NLaurentiu Tudor <Laurentiu.Tudor@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 5月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 5月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 5月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 2月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 9月, 2014 11 次提交
-
-
由 Alexander Graf 提交于
The kvmppc_get_last_inst function recently received a facelift that allowed us to pass an enum of the type of instruction we want to read into it rather than an unreadable boolean. Unfortunately, not all callers ended up passing the enum. This wasn't really an issue as "true" and "false" happen to match the two enum values we have, but it's still hard to read. Update all callers of kvmppc_get_last_inst() to follow the new calling convention. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Madhavan Srinivasan 提交于
This patch extends the use of illegal instruction as software breakpoint instruction across the ppc platform. Patch extends booke program interrupt code to support software breakpoint. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> [agraf: Fix bookehv] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Powerpc timer implementation is a copycat version of s390. Now that they removed the tasklet with commit ea74c0ea follow this optimization. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NBogdan Purcareata <bogdan.purcareata@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Make ONE_REG generic for server and embedded architectures by moving kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions to powerpc layer. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host infrastructure so follow the same approach for AltiVec. Book3e specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which of the SPE or AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec exception handlers distinct to improve code readability. Guests have the privilege to enable AltiVec, so we always need to support AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore the unit context. KVM will be loaded on cores with AltiVec unit only if CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
Increase FPU laziness by loading the guest state into the unit before entering the guest instead of doing it on each vcpu schedule. Without this improvement an interrupt may claim floating point corrupting guest state. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Guest visible debug register and hardware visible debug registers are same, so ther is no need to have arch->shadow_dbg_reg, instead use arch->dbg_reg. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Dbsr is not visible to userspace and we do not think any need to expose this to userspace because: Userspace cannot inject debug interrupt to guest (as this does not know guest ability to handle debug interrupt), so userspace will always clear DBSR. Now if userspace has to always clear DBSR in KVM_EXIT_DEBUG handling then clearing dbsr in kernel looks simple as this avoid doing SET_SREGS/set_one_reg() to clear DBSR Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Debug interrupt can be either "critical level" or "debug level". There are separate set of save/restore registers used for different level. Example: DSRR0/DSRR1 are used for "debug level" and CSRR0/CSRR1 are used for critical level debug interrupt. Using CPU_FTR_DEBUG_LVL_EXC to decide which interrupt level to be used. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 29 7月, 2014 2 次提交
-
-
由 Alexander Graf 提交于
DCR handling was only needed for 440 KVM. Since we removed it, we can also remove handling of DCR accesses. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
We're going to implement guest code interpretation in KVM for some rare corner cases. This code needs to be able to inject data and instruction faults into the guest when it encounters them. Expose generic APIs to do this in a reasonably subarch agnostic fashion. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 7月, 2014 8 次提交
-
-
由 Alexander Graf 提交于
We have a nice API to find the translated GPAs of a GVA including protection flags. So far we only use it on Book3S, but there's no reason the same shouldn't be used on BookE as well. Implement a kvmppc_xlate() version for BookE and clean it up to make it more readable in general. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
On book3e, KVM uses load external pid (lwepx) dedicated instruction to read guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI and LRAT), generated by loading a guest address, needs to be handled by KVM. These exceptions are generated in a substituted guest translation context (EPLC[EGS] = 1) from host context (MSR[GS] = 0). Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), doing minimal checks on the fast path to avoid host performance degradation. lwepx exceptions originate from host state (MSR[GS] = 0) which implies additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious too intrusive for the host. Read guest last instruction from kvmppc_load_last_inst() by searching for the physical address and kmap it. This address the TODO for TLB eviction and execute-but-not-read entries, and allow us to get rid of lwepx until we are able to handle failures. A simple stress benchmark shows a 1% sys performance degradation compared with previous approach (lwepx without failure handling): time for i in `seq 1 10000`; do /bin/echo > /dev/null; done real 0m 8.85s user 0m 4.34s sys 0m 4.48s vs real 0m 8.84s user 0m 4.36s sys 0m 4.44s A solution to use lwepx and to handle its exceptions in KVM would be to temporary highjack the interrupt vector from host. This imposes additional synchronizations for cores like FSL e6500 that shares host IVOR registers between hardware threads. This optimized solution can be later developed on top of this patch. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Mihai Caraman 提交于
On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So rename and move get_epr helper function to same file. Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> [agraf: remove duplicate return] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Use kvmppc_set_sprg[0-7]() and kvmppc_get_sprg[0-7]() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 27 1月, 2014 1 次提交
-
-
由 Scott Wood 提交于
Simplify the handling of lazy EE by going directly from fully-enabled to hard-disabled. This replaces the lazy_irq_pending() check (including its misplaced kvm_guest_exit() call). As suggested by Tiejun Chen, move the interrupt disabling into kvmppc_prepare_to_enter() rather than have each caller do it. Also move the IRQ enabling on heavyweight exit into kvmppc_prepare_to_enter(). Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 09 1月, 2014 4 次提交
-
-
由 Tiejun Chen 提交于
Rather than calling hard_irq_disable() when we're back in C code we can just call RECONCILE_IRQ_STATE to soft disable IRQs while we're already in hard disabled state. This should be functionally equivalent to the code before, but cleaner and faster. Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com> [agraf: fix comment, commit message] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Bharat Bhushan 提交于
KVM uses same WIM tlb attributes as the corresponding qemu pte. For this we now search the linux pte for the requested page and get these cache caching/coherency attributes from pte. Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com> Reviewed-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Now that we have the vcpu floating-point and vector state stored in the same type of struct as the main kernel uses, we can load that state directly from the vcpu struct instead of having extra copies to/from the thread_struct. Similarly, when the guest state needs to be saved, we can have it saved it directly to the vcpu struct by setting the current->thread.fp_save_area and current->thread.vr_save_area pointers. That also means that we don't need to back up and restore userspace's FP/vector state. This all makes the code simpler and faster. Note that it's not necessary to save or modify current->thread.fpexc_mode, since nothing in KVM uses or is affected by its value. Nor is it necessary to touch used_vr or used_vsr. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This uses struct thread_fp_state and struct thread_vr_state to store the floating-point, VMX/Altivec and VSX state, rather than flat arrays. This makes transferring the state to/from the thread_struct simpler and allows us to unify the get/set_one_reg implementations for the VSX registers. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 11 12月, 2013 1 次提交
-
-
由 Scott Wood 提交于
Commit ce11e48b ("KVM: PPC: E500: Add userspace debug stub support") added "struct thread_struct" to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced "struct debug_reg". Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs.... Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 18 10月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
This moves the kvmppc_ops callbacks to be a per VM entity. This enables us to select HV and PR mode when creating a VM. We also allow both kvm-hv and kvm-pr kernel module to be loaded. To achieve this we move /dev/kvm ownership to kvm.ko module. Depending on which KVM mode we select during VM creation we take a reference count on respective module Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix coding style] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 17 10月, 2013 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
We will use that in the later patch to find the kvm ops handler Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
This patch add a new callback kvmppc_ops. This will help us in enabling both HV and PR KVM together in the same kernel. The actual change to enable them together is done in the later patch in the series. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in booke changes] Signed-off-by: NAlexander Graf <agraf@suse.de>
-