- 24 3月, 2011 1 次提交
-
-
由 Stephen Wilson 提交于
This tag is intended to mirror the thread info TIF_IA32 flag. Will be used to identify mm's which support 32 bit tasks running in compatibility mode without requiring a reference to the task itself. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 3月, 2011 1 次提交
-
-
由 Sage Weil 提交于
It is frequently useful to sync a single file system, instead of all mounted file systems via sync(2): - On machines with many mounts, it is not at all uncommon for some of them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on those and may never get to the one you do care about (e.g., /). - Some applications write lots of data to the file system and then want to make sure it is flushed to disk. Calling fsync(2) on each file introduces unnecessary ordering constraints that result in a large amount of sub-optimal writeback/flush/commit behavior by the file system. There are currently two ways (that I know of) to sync a single super_block: - BLKFLSBUF ioctl on the block device: That also invalidates the bdev mapping, which isn't usually desirable, and doesn't work for non-block file systems. - 'mount -o remount,rw' will call sync_filesystem as an artifact of the current implemention. Relying on this little-known side effect for something like data safety sounds foolish. Both of these approaches require root privileges, which some applications do not have (nor should they need?) given that sync(2) is an unprivileged operation. This patch introduces a new system call syncfs(2) that takes an fd and syncs only the file system it references. Maybe someday we can $ sync /some/path and not get sync: ignoring all arguments The syscall is motivated by comments by Al and Christoph at the last LSF. syncfs(2) seems like an appropriate name given statfs(2). A similar ioctl was also proposed a while back, see http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 3月, 2011 38 次提交
-
-
由 Shaohua Li 提交于
According to intel CPU manual, every time PGD entry is changed in i386 PAE mode, we need do a full TLB flush. Current code follows this and there is comment for this too in the code. But current code misses the multi-threaded case. A changed page table might be used by several CPUs, every such CPU should flush TLB. Usually this isn't a problem, because we prepopulate all PGD entries at process fork. But when the process does munmap and follows new mmap, this issue will be triggered. When it happens, some CPUs keep doing page faults: http://marc.info/?l=linux-kernel&m=129915020508238&w=2 Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com> Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Mallick Asit K <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org> Cc: stable <stable@kernel.org> LKML-Reference: <1300246649.2337.95.camel@sli10-conroe> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Namhyung Kim 提交于
Current stack dump code scans entire stack and check each entry contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it could mark whether the pointer is valid or not based on value of the frame pointer. Invalid entries could be preceded by '?' sign. However this was not going to happen because scan start point was always higher than the frame pointer so that they could not meet. Commit 9c0729dc ("x86: Eliminate bp argument from the stack tracing routines") delayed bp acquisition point, so the bp was read in lower frame, thus all of the entries were marked invalid. This patch fixes this by reverting above commit while retaining stack_frame() helper as suggested by Frederic Weisbecker. End result looks like below: before: [ 3.508329] Call Trace: [ 3.508551] [<ffffffff814f35c9>] ? panic+0x91/0x199 [ 3.508662] [<ffffffff814f3739>] ? printk+0x68/0x6a [ 3.508770] [<ffffffff81a981b2>] ? mount_block_root+0x257/0x26e [ 3.508876] [<ffffffff81a9821f>] ? mount_root+0x56/0x5a [ 3.508975] [<ffffffff81a98393>] ? prepare_namespace+0x170/0x1a9 [ 3.509216] [<ffffffff81a9772b>] ? kernel_init+0x1d2/0x1e2 [ 3.509335] [<ffffffff81003894>] ? kernel_thread_helper+0x4/0x10 [ 3.509442] [<ffffffff814f6880>] ? restore_args+0x0/0x30 [ 3.509542] [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2 [ 3.509641] [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10 after: [ 3.522991] Call Trace: [ 3.523351] [<ffffffff814f35b9>] panic+0x91/0x199 [ 3.523468] [<ffffffff814f3729>] ? printk+0x68/0x6a [ 3.523576] [<ffffffff81a981b2>] mount_block_root+0x257/0x26e [ 3.523681] [<ffffffff81a9821f>] mount_root+0x56/0x5a [ 3.523780] [<ffffffff81a98393>] prepare_namespace+0x170/0x1a9 [ 3.523885] [<ffffffff81a9772b>] kernel_init+0x1d2/0x1e2 [ 3.523987] [<ffffffff81003894>] kernel_thread_helper+0x4/0x10 [ 3.524228] [<ffffffff814f6880>] ? restore_args+0x0/0x30 [ 3.524345] [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2 [ 3.524445] [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10 -v5: * fix build breakage with oprofile -v4: * use 0 instead of regs->bp * separate out printk changes -v3: * apply comment from Frederic * add a couple of printk fixes Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Soren Sandmann <ssp@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <1300416006-3163-1-git-send-email-namhyung@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
The many stray whitespaces and other uncleanlinesses made this code almost unreadable to me - so fix those. No changes to the code. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Lucas De Marchi 提交于
They were generated by 'codespell' and then manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gleb Natapov 提交于
Commit 6440e5967bc broke old userspaces that do not set tss address before entering vcpu. Unbreak it by setting tss address to a safe value on the first vcpu entry. New userspaces should set tss address, so print warning in case it doesn't. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Xiao Guangrong 提交于
This patch does: - call vcpu->arch.mmu.update_pte directly - use gfn_to_pfn_atomic in update_pte path The suggestion is from Avi. Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Cleanup the code of pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
fix: [ 3494.671786] stack backtrace: [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 3494.671790] Call Trace: [ 3494.671796] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 3494.671826] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 3494.671834] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 3494.671843] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 3494.671851] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 3494.671861] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 3494.671867] [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel] and: [ 8328.789599] stack backtrace: [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 8328.789603] Call Trace: [ 8328.789609] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 8328.789621] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 8328.789628] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 8328.789635] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 8328.789643] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 8328.789699] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 8328.789713] [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel] Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Nikola Ciprich 提交于
commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu), breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function to proper place. Signed-off-by: NNikola Ciprich <nikola.ciprich@linuxbox.cz> Acked-by: NZachary Amsden <zamsden@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
Currently if io port + len crosses 8bit boundary in io permission bitmap the check may allow IO that otherwise should not be allowed. The patch fixes that. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
Current implementation truncates upper 32bit of TR base address during IO permission bitmap check. The patch fixes this. Reported-and-tested-by: NFrancis Moreau <francis.moro@gmail.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
With CONFIG_CC_STACKPROTECTOR, we need a valid %gs at all times, so disable lazy reload and do an eager reload immediately after the vmexit. Reported-by: NIVAN ANGELOV <ivangotoy@gmail.com> Acked-By: NJoerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Takuya Yoshikawa 提交于
Access to this page is mostly done through the regs member which holds the address to this page. The exceptions are in vmx_vcpu_reset() and kvm_free_lapic() and these both can easily be converted to using regs. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
These macros are not used, so removed Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Using __get_free_page instead of alloc_page and page_address, using free_page instead of __free_page and virt_to_page Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
No need to record the gfn to verifier the pte has the same mode as current vcpu, it's because we only speculatively update the pte only if the pte and vcpu have the same mode Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by kvm->slots_lock, so move it out of mmu spinlock Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Set spte accessed bit only if guest_initiated == 1 that means the really accessed Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Only remove write access in the last sptes. Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Lai Jiangshan 提交于
use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Lai Jiangshan 提交于
The hash array of async gfns may still contain some left gfns after kvm_clear_async_pf_completion_queue() called, need to clear them. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
Currently vm86 task is initialized on each real mode entry and vcpu reset. Initialization is done by zeroing TSS and updating relevant fields. But since all vcpus are using the same TSS there is a race where one vcpu may use TSS while other vcpu is initializing it, so the vcpu that uses TSS will see wrong TSS content and will behave incorrectly. Fix that by initializing TSS only once. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
When rmode.vm86 is active TR descriptor is updated with vm86 task values, but selector is left intact. vmx_set_segment() makes sure that if TR register is written into while vm86 is active the new values are saved for use after vm86 is deactivated, but since selector is not updated on vm86 activation/deactivation new value is lost. Fix this by writing new selector into vmcs immediately. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Lai Jiangshan 提交于
The changelog of 104f226b said "adds the __noclone attribute", but it was missing in its patch. I think it is still needed. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Jan Kiszka 提交于
Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
isr_ack logic was added by e4825800 to avoid unnecessary IPIs. Back then it made sense, but now the code checks that vcpu is ready to accept interrupt before sending IPI, so this logic is no longer needed. The patch removes it. Fixes a regression with Debian/Hurd. Signed-off-by: NGleb Natapov <gleb@redhat.com> Reported-and-tested-by: NJonathan Nieder <jrnieder@gmail.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Joseph Cihula 提交于
This patch fixes the logic used to detect whether BIOS has disabled VMX, for the case where VMX is enabled only under SMX, but tboot is not active. Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Jan Kiszka 提交于
Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
When we enable an NMI window, we ask for an IRET intercept, since the IRET re-enables NMIs. However, the IRET intercept happens before the instruction executes, while the NMI window architecturally opens afterwards. To compensate for this mismatch, we only open the NMI window in the following exit, assuming that the IRET has by then executed; however, this assumption is not always correct; we may exit due to a host interrupt or page fault, without having executed the instruction. Fix by checking for forward progress by recording and comparing the IRET's rip. This is somewhat of a hack, since an unchaging rip does not mean that no forward progress has been made, but is the simplest fix for now. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
The interrupt injection logic looks something like if an nmi is pending, and nmi injection allowed inject nmi if an nmi is pending request exit on nmi window the problem is that "nmi is pending" can be set asynchronously by the PIT; if it happens to fire between the two if statements, we will request an nmi window even though nmi injection is allowed. On SVM, this has disasterous results, since it causes eflags.TF to be set in random guest code. The fix is simple; make nmi_pending synchronous using the standard vcpu->requests mechanism; this ensures the code above is completely synchronous wrt nmi_pending. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Use the new support in the emulator, and drop the ad-hoc code in x86.c. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Mark some instructions as vendor specific, and allow the caller to request emulation only of vendor specific instructions. This is useful in some circumstances (responding to a #UD fault). Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
x86_decode_insn() doesn't return X86EMUL_* values, so the check for X86EMUL_PROPOGATE_FAULT will always fail. There is a proper check later on, so there is no need for a replacement for this code. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Jan Kiszka 提交于
This warning was once used for debugging QEMU user space. Though uncommon, it is actually possible to send an INIT request to a running VCPU. So better drop this warning before someone misuses it to flood kernel logs this way. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Glauber Costa 提交于
When a vcpu is reset, kvmclock page keeps being written to this days. This is wrong and inconsistent: a cpu reset should take it to its initial state. Signed-off-by: NGlauber Costa <glommer@redhat.com> CC: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 john cooper 提交于
A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID "model" field correction from 6 -> 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as "reserved" in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: Njohn cooper <john.cooper@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Xiao Guangrong 提交于
Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Jan Kiszka 提交于
This case is a pure user space error we do not need to record. Moreover, it can be misused to flood the kernel log. Remove it. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-