- 27 8月, 2014 1 次提交
-
-
由 Christoph Lameter 提交于
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 09 8月, 2014 1 次提交
-
-
由 Daniel Walter 提交于
Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: NDaniel Walter <dwalter@google.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 8月, 2014 3 次提交
-
-
由 Wanpeng Li 提交于
After commit 77b0f5d6 (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), "Acknowledge interrupt on exit" behavior can be emulated. To do so, KVM will ask the APIC for the interrupt vector if during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv, kvm_get_apic_interrupt would return -1 and give the following WARNING: Call Trace: [<ffffffff81493563>] dump_stack+0x49/0x5e [<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96 [<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17 [<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm] [<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm] To fix this, we cannot rely on the processor's virtual interrupt delivery, because "acknowledge interrupt on exit" must only update the virtual ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR) but it should not deliver the interrupt through the IDT. Thus, KVM has to deliver the interrupt "by hand", similar to the treatment of EOI in commit fc57ac2c (KVM: lapic: sync highest ISR to hardware apic on EOI, 2014-05-14). The patch modifies kvm_cpu_get_interrupt to always acknowledge an interrupt; there are only two callers, and the other is not affected because it is never reached with kvm_apic_vid_enabled() == true. Then it modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition to the registers. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Suggested-by: N"Zhang, Yang Z" <yang.z.zhang@intel.com> Tested-by: NLiu, RongrongX <rongrongx.liu@intel.com> Tested-by: NFelipe Reyes <freyes@suse.com> Fixes: 77b0f5d6 Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
An external interrupt will cause a vmexit with reason "external interrupt" when L2 is running. L1 will pick up the interrupt through vmcs12 if L1 set the ack interrupt bit. Commit 77b0f5d6 (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to) retrieves the interrupt that belongs to L1 before vmcs01 is loaded. This will lead to problems in the next patch, which would write to SVI of vmcs02 instead of vmcs01 (SVI of vmcs02 doesn't make sense because L2 runs without APICv). Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Tested-by: NLiu, RongrongX <rongrongx.liu@intel.com> Tested-by: NFelipe Reyes <freyes@suse.com> Fixes: 77b0f5d6 Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> [Move tracepoint as well. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paul Mackerras 提交于
Currently, the IRQFD code is conditional on CONFIG_HAVE_KVM_IRQ_ROUTING. So that we can have the IRQFD code compiled in without having the IRQ routing code, this creates a new CONFIG_HAVE_KVM_IRQFD, makes the IRQFD code conditional on it instead of CONFIG_HAVE_KVM_IRQ_ROUTING, and makes all the platforms that currently select HAVE_KVM_IRQ_ROUTING also select HAVE_KVM_IRQFD. Signed-off-by: NPaul Mackerras <paulus@samba.org> Tested-by: NEric Auger <eric.auger@linaro.org> Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 7月, 2014 1 次提交
-
-
由 Mark D Rustad 提交于
Resolve shadow warnings that appear in W=2 builds. Instead of using ret to hold the return pointer, save the length in a new variable saved_len and compute the pointer on exit. This also resolves a very technical error, in that ret was declared as a const char *, when it really was a char * const. Signed-off-by: NMark Rustad <mark.d.rustad@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 7月, 2014 1 次提交
-
-
由 Chris J Arges 提交于
Remove a prototype which was added by both 93c4adc7 and 36be0b9d. Signed-off-by: NChris J Arges <chris.j.arges@canonical.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 7月, 2014 1 次提交
-
-
由 Alexander Graf 提交于
In preparation to make the check_extension function available to VM scope we add a struct kvm * argument to the function header and rename the function accordingly. It will still be called from the /dev/kvm fd, but with a NULL argument for struct kvm *. Signed-off-by: NAlexander Graf <agraf@suse.de> Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 7月, 2014 1 次提交
-
-
由 Mark Rustad 提交于
Resolve a shadow warning generated in W=2 builds by the nested use of the min macro by instead using the min3 macro for the minimum of 3 values. Signed-off-by: NMark Rustad <mark.d.rustad@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 7月, 2014 8 次提交
-
-
由 Paolo Bonzini 提交于
Using ARRAY_SIZE directly makes it easier to read the code. While touching the code, replace the division by a multiplication in the recently added BUILD_BUG_ON. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Currently there is no check whether shared MSRs list overrun the allocated size which can results in bugs. In addition there is no check that vmx->guest_msrs has sufficient space to accommodate all the VMX msrs. This patch adds the assertions. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
x86 does not automatically set rflags.rf during event injection. This patch does partial job, setting rflags.rf upon fault injection. It does not handle the setting of RF upon interrupt injection on rep-string instruction. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
This patch updates RF for rep-string emulation. The flag is set upon the first iteration, and cleared after the last (if emulated). It is intended to make sure that if a trap (in future data/io #DB emulation) or interrupt is delivered to the guest during the rep-string instruction, RF will be set correctly. RF affects whether instruction breakpoint in the guest is masked. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Gleixner 提交于
The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Use the new nanoseconds based interface and get rid of the timespec conversion dance. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
- 21 7月, 2014 8 次提交
-
-
由 Nadav Amit 提交于
Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Make nested_release_vmcs12 idempotent. Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in order to detach it from the shadow vmcs. However, this is not available anymore after commit 26a865f4 (KVM: VMX: fix use after free of vmx->loaded_vmcs, 2014-01-03). Revert that patch, and fix its problem by forcing a vmcs01 as the active VMCS before freeing all the nested VMX state. Reported-by: NWanpeng Li <wanpeng.li@linux.intel.com> Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints. However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes this behavior. KVM, however, still appears not to update RFLAGS.RF correctly, regardless of this patch. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
RFLAGS.RF was cleaned in several functions (e.g., syscall) in the x86 emulator. Now that we clear it before the execution of an instruction in the emulator, we can remove the specific cleanup of RFLAGS.RF. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
When an instruction is emulated RFLAGS.RF should be cleared. KVM previously did not do so. This patch clears RFLAGS.RF after interception is done. If a fault occurs during the instruction, RFLAGS.RF will be set by a previous patch. This patch does not handle the case of traps/interrupts during rep-strings. Traps are only expected to occur on debug watchpoints, and those are anyhow not handled by the emulator. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
RFLAGS.RF is always zero after popf. Therefore, popf should not updated RF, as anyhow emulating popf, just as any other instruction should clear RFLAGS.RF. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
When skipping an emulated instruction, rflags.rf should be cleared as it would be on real x86 CPU. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 7月, 2014 1 次提交
-
-
由 Wanpeng Li 提交于
This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331, after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is some progress and the L2 can boot up, however, slowly. The original idea of this fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>. Interrupt which delivered by vid should be injected to L1 by L0 if current is in L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't have set External-interrupt exiting bit. The current logic doen't consider these cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old injection way if L1 doen't have External-interrupt exiting bit set. Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: N"Zhang, Yang Z" <yang.z.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2014 14 次提交
-
-
由 Nadav Amit 提交于
Certain instructions (e.g., mwait and monitor) cause a #UD exception when they are executed in user mode. This is in contrast to the regular privileged instructions which cause #GP. In order not to mess with SVM interception of mwait and monitor which assumes privilege level assertions take place before interception, a flag has been added. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Certain instructions, such as monitor and xsave do not support big real mode and cause a #GP exception if any of the accessed bytes effective address are not within [0, 0xffff]. This patch introduces a flag to mark these instructions, including the necassary checks. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Emulator accesses are always done a page at a time, either by the emulator itself (for fetches) or because we need to query the MMU for address translations. Speed up these accesses by using kvm_read_guest_page and, in the case of fetches, by inlining kvm_read_guest_virt_helper and dropping the loop around kvm_read_guest_page. This final tweak saves 30-100 more clock cycles (4-10%), bringing the count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series and 925-1700 after the first two low-hanging fruit changes. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
When the CS base is not page-aligned, the linear address of the code could get close to the page boundary (e.g. 0x...ffe) even if the EIP value is not. So we need to first linearize the address, and only then compute the number of valid bytes that can be fetched. This happens relatively often when executing real mode code. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This simplifies the code a bit, especially the overflow checks. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
We do not need a memory copying loop anymore in insn_fetch; we can use a byte-aligned pointer to access instruction fields directly from the fetch_cache. This eliminates 50-150 cycles (corresponding to a 5-10% improvement in performance) from each instruction. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
do_insn_fetch_bytes will only be called once in a given insn_fetch and insn_fetch_arr, because in fact it will only be called at most twice for any instruction and the first call is explicit in x86_decode_insn. This observation lets us hoist the call out of the memory copying loop. It does not buy performance, because most fetches are one byte long anyway, but it prepares for the next patch. The overflow check is tricky, but correct. Because do_insn_fetch_bytes has already been called once, we know that fc->end is at least 15. So it is okay to subtract the number of bytes we want to read. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Hoist the common case up from do_insn_fetch_byte to do_insn_fetch, and prime the fetch_cache in x86_decode_insn. This helps a bit the compiler and the branch predictor, but above all it lays the ground for further changes in the next few patches. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
rip_relative is only set if decode_modrm runs, and if you have ModRM you will also have a memopp. We can then access memopp unconditionally. Note that rip_relative cannot be hoisted up to decode_modrm, or you break "mov $0, xyz(%rip)". Also, move typecast on "out of range value" of mem.ea to decode_modrm. Together, all these optimizations save about 50 cycles on each emulated instructions (4-6%). Signed-off-by: NBandan Das <bsd@redhat.com> [Fix immediate operands with rip-relative addressing. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
x86_decode_insn already sets a default for seg_override, so remove it from the zeroed area. Also replace set/get functions with direct access to the field. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
A lot of initializations are unnecessary as they get set to appropriate values before actually being used. Optimize placement of fields in x86_emulate_ctxt Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
Remove the if conditional - that will help us avoid an "else initialize to 0" Also, rearrange operators for slightly better code. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
The same information can be gleaned from ctxt->d and avoids having to zero/NULL initialize intercept and check_perm Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
Core emulator functions all belong in emulator.c, x86 should have no knowledge of emulator internals Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-